[cairo] [RFC] Pixman & compositing with overlapping source and destination pixel data

Mon Oct 26 01:11:12 PDT 2009

On 26-10-09 01:57, Siarhei Siamashka wrote:
> On Friday 23 October 2009, Koen Kooi wrote:
>>> I'm not sure about pixman_gc_t since most of the needed operations are just
>>> simple copies. What about starting with just introducing a variant
>>> of 'pixman_blt' which is overlapping aware?
>>>
>>> I created a work-in-progress branch with 'pixman_blt' function (generic C
>>> implementation for now) extended to support overlapped source/destination
>>> case. A simple test program is also included:
>>> http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt
>
> First, this branch is outdated. There is a new branch with the final code :)
> http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt-v2
>
>> Would using said branch give me 'magically' a performance boost (e.g.
>> not make firefox unusably slow as it is now on an 600MHz cortex a8) or
>> would I need to patch other libs (e.g. xrender) as well?
>
> Not really, it's just a small extension of pixman functionality. Currently
> the handling of overlapped blt operation (for software rendering) is done
> in xorg-server. As it is the responsibility of pixman to provide CPU-specific
> SIMD optimizations (NEON for ARM Cortex-A8), it would be quite natural to
> move this work to pixman. So the next steps are to add NEON optimizations
> to pixman_plt and make sure that xserver takes advantage of these
> optimizations for the overlapped blit too.

So:

1) merge your branch into pixman master
2) move overlapped blit handling from xserver-xorg to pixman
3) add SIMD optimizations to pixman

Would give us better scrolling, right?

> As for improving scrolling performance (and assuming a standard fbdev driver),
> the most important thing is to improve framebuffer memory performance. Right
> now framebuffer memory is mapped as noncached writecombine on OMAP3. Enabling
> write-through cache for it (with a simple kernel patch) improves scrolling
> and moving windows performance by 4x-5x factor (unless shadow framebuffer is
> used, which is also not good for performance). This works fine if nothing
> but CPU can modify framebuffer memory. But if GPU or DSP can also access
> framebuffer memory or compositing manager is used, everything gets more
> complicated. Cache invalidate operations will have to be inserted in
> appropriate places in order to ensure memory coherency and uniform view
> of its content from all the units. If default write-back cache is used
> instead of write-through, cache flush operations are needed too.

I have no idea how the sgx or dsp handle the framebuffer, but I'm using 
both.

> Unpatched firefox is also quite slow for another reason - it tries to
> always work with 32bpp data internally, no matter what color depth is
> used for desktop.

I'm already using your patch for that :)

regards,

Koen