[cairo] [RFC] Pixman & compositing with overlapping source and destination pixel data

Mon Oct 19 15:29:09 PDT 2009

Hi,

> On Thursday 04 June 2009, Soeren Sandmann wrote:
> > Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > > What kind of guarantees (or the lack of) pixman and XRender are supposed
> > > to provide when dealing with overlapping parts of images?
> >
> > (Adding xorg-devel). See this thread:
> >
> >     http://lists.freedesktop.org/archives/xorg/2008-October/039346.html
> >
> > The guarantee that I would suggest for Render and pixman is that if
> > any pixel is both read and written in the same request, then the
> > result of the whole request is undefined, except that it obeys the
> > clipping rules.
> >
> > > The practical use case could be scrolling of data inside of a single big
> > > image. If rendering with overlapped source and destination areas is not
> > > supported, a temporary image has to be created to achieve expected result
> > > and this is an additional performance hit.
> >
> > Yes, scrolling is one thing that the current pixman API doesn't really
> > provide. 'pixman_blt()' only deals with cases where the source and
> > destination don't overlap.
> >
> > I think the best solution is to move all of the X primitives
> > (CopyArea, DrawLine, DrawArc, etc.) into pixman. For CopyArea it would
> > probably look something like this:
> >
> >         void
> >         pixman_copy_area (pixman_image_t *src,
> >                           pixman_image_t *dest,
> >                           pixman_gc_t *gc,
> >                           int src_x,
> >                           int src_y,
> >                           int width,
> >                           int height,
> >                           int dest_x,
> >                           int dest_y);
> >
> > and that would be guaranteed to handle overlapping src and dest. A
> > pixman_gc_t would be a new type of object, corresponding to an X GC.
> >
> > pixman_blt() would then become a deprecated wrapper that would just
> > call pixman_copy_area(). Same for pixman_fill() and a new
> > pixman_fill_rectangles().
> 
> I'm not sure about pixman_gc_t since most of the needed operations are just
> simple copies. What about starting with just introducing a variant
> of 'pixman_blt' which is overlapping aware?

The pixman_blt() interface is misdesigned for two reasons: (1) the
strides are given in number-of-uint32_ts, which gratuitously limits
the types of images that can be processed, and (2) it can fail if it
doesn't like the input for some reason. 

At the same time, having the core primitives available on the client
side is useful in some cases, and the software implementation of them
can more easily be optimized with SIMD instructions in pixman.

Moving core rendering into pixman solves both issues at the same time.

But that said, I am not opposed to extending pixman_blt() to support
overlapping copies. That is certainly a simpler first step.

> I created a work-in-progress branch with 'pixman_blt' function (generic C
> implementation for now) extended to support overlapped source/destination
> case. A simple test program is also included:
> http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt
> 
> Making use of the already existing SIMD optimized pixel copy functions should
> provide fast scrolling in all the directions except for from left to right.
> This special case will require a SIMD optimized backwards copy.
> 
> I wonder if it makes sense to drop delegates support for pixman_blt and make
> call chain shorter when introducing SIMD optimized copies? It seems to be a
> little bit overdesigned here.

How would you support SSE2 and MMX in the same binary then?

Also, I really don't see much potential for saving here. For a NEON
implementation of blt, the callchain would be:

   pixman_blt() ->  _pixman_implementation_blt() -> neon_blt()

and getting rid of delegates wouldn't really affect that at all. You
could get rid of the _pixman_implementation_blt() call by making it a
macro, but as I mentioned before, gcc turns it into a tail call that
reused the arguments on the stack, so the overhead really is minimal.

Soren