[cairo] Optimize spans in the trapezoid rasterizer

Mon Jul 25 21:33:14 PDT 2005

On Mon, 2005-07-25 at 16:57 -0700, David Schleef wrote:
> On Mon, Jul 25, 2005 at 07:01:42PM -0400, Keith Packard wrote:
> > On Tue, 2005-07-26 at 14:48 +1200, Robert O'Callahan wrote:
> > 
> > > It depends on whether the branch is predictable or not. Unpredictable
> > > branches will kill performance, predictable branches are almost free.
> > > "Predictable" basically means "the next value for the condition is
> > > usually the same as the previous one".
> > > 
> > > For this particular piece of code, a good compiler/architecture will
> > > actually use predicated execution/conditional moves to eliminate the
> > > conditional branch, so that predictability no longer matters. In fact,
> > > gcc -O2 turns your saturation statement into a CMOV on -mi686 or
> > > greater.
> > 
> > I guess the question is how this compares to the branch-less equivalent
> > code on our target architectures.  I consider these to be AMD64, x86 and
> > ARM, although I'm sure others would include additional ones.
> 
> I strongly recommend writing readable code over cryptic code that
> may be faster or slower by a cycle or two.  Leave the cycle chasing
> to projects that have the framework to actually measure it.
> 
> (FWIW, branches containing a few instructions are faster than cmov
> on processors I've looked at.)

Oprofile is quite adequate for this. A while ago I got some significant
speedups in GNU ld using oprofile, and cycles spent in mispredicted
branches were a big part of the problem:
http://weblogs.mozillazine.org/roc/archives/2005/02/optimizing_gnu.html

I don't think anyone disagrees that writing readable code is the first
priority, but there's still a place for profiling and optimization and
sometimes that means tweaking low level code.

Rob