[cairo] Special case pixman_rasterize_trapezoid() for boxes

Tue Aug 11 12:07:30 PDT 2009

On Tue, 2009-08-11 at 20:25 +0200, Soeren Sandmann wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Hi all,
> > 
> > I'd like some feedback on the sanity of adding a special case inside
> > pixman_rasterize_trapezoid() for the rectilinear condition. In
> > particular, guidance on handling the 1/8 bpp cases.
> 
> Well, I can't say I'm very thrilled about fast pathing in the
> trapezoid code, when trapezoids are increasingly becoming
> irrelevant. But on the other hand, we don't have a polygon rasterizer
> in pixman yet, and nobody knows if or when we will, so given that,
> speeding up the rectilinear case probably does make sense if there are
> signifcant time savings to be had. If we end up doing this, it would
> have to happen after 0.16.0.

With the current spans implementation in cairo, the trapezoid code is
still much faster for rectilinear fills. Hence the interest here. There
is a large amount of overhead that will be eliminated by using
pixman_composite_polygon(), but I suspect spans will still be slower (in
all likelihood) than the special purpose rectilinear compositors.

> A couple of specific comments:
> 
> - Given that this is a fast path, I'm not sure it makes sense to worry
>   about the accessor case. If each pixel goes through an indirect
>   function call, performance will never be great. So maybe only do
>   this when PIXMAN_FB_ACCESSORS is not defined

Ok, that will remove some of the clutter from the code.

> - Are any other cases than 8bpp actually interesting? Ie., are there
>   real-world cases where people use the trapezoid code to draw lots of
>   rectangles on an a1 surface?

All unantialiased rendering in cairo currently goes through traps,
usually to an a1 image. One of the use cases for unantialiased rendering
is in performing FSAA. Although I'm currently lacking a real-world use
case for it, I hope to implement an FSAA surface wrapper for cairo in
the near future so that I can compare performance/quality trade-offs
whilst replaying the traces to various backends.

> - In general, I don't trust the numbers from callgrind very much. As
>   far as I know callgrind doesn't take memory access into account, so
>   the numbers don't correspond very well to wall clock time which is
>   often dominated by memory bandwidth and/or latency.
> 
>   Both oprofile and sysprof produce more accurate numbers, but even so
>   there is no real substitute for measuring the actual real time
>   savings.

Noted, I was using callgrind so that I had an accurate callgraph and so
quoted the numbers at hand. According to perf and sysprof,
pixman_rasterize_edges() was still the most expensive function even
though I was using spans for everything but regions and rectilinear
boxes, hence why it became the focus of for special-casing. I'll gather
some more realistic numbers in the next pass.

Thank you Soeren for your comments. Fantastic work on reorganising
pixman, it is so much easier to navigate and grok now!
-ickle