[cairo] Optimize spans in the trapezoid rasterizer
rocallahan at novell.com
Mon Jul 25 21:33:14 PDT 2005
On Mon, 2005-07-25 at 16:57 -0700, David Schleef wrote:
> On Mon, Jul 25, 2005 at 07:01:42PM -0400, Keith Packard wrote:
> > On Tue, 2005-07-26 at 14:48 +1200, Robert O'Callahan wrote:
> > > It depends on whether the branch is predictable or not. Unpredictable
> > > branches will kill performance, predictable branches are almost free.
> > > "Predictable" basically means "the next value for the condition is
> > > usually the same as the previous one".
> > >
> > > For this particular piece of code, a good compiler/architecture will
> > > actually use predicated execution/conditional moves to eliminate the
> > > conditional branch, so that predictability no longer matters. In fact,
> > > gcc -O2 turns your saturation statement into a CMOV on -mi686 or
> > > greater.
> > I guess the question is how this compares to the branch-less equivalent
> > code on our target architectures. I consider these to be AMD64, x86 and
> > ARM, although I'm sure others would include additional ones.
> I strongly recommend writing readable code over cryptic code that
> may be faster or slower by a cycle or two. Leave the cycle chasing
> to projects that have the framework to actually measure it.
> (FWIW, branches containing a few instructions are faster than cmov
> on processors I've looked at.)
Oprofile is quite adequate for this. A while ago I got some significant
speedups in GNU ld using oprofile, and cycles spent in mispredicted
branches were a big part of the problem:
I don't think anyone disagrees that writing readable code is the first
priority, but there's still a place for profiling and optimization and
sometimes that means tweaking low level code.
More information about the cairo