[cairo] Overhead reduction

Mon May 18 00:27:31 PDT 2009

> > Okay, that's useful to know.  An atomic-test-and-inc routine should
> > be roughly the same speed as a plain i++ on most modern hardware, it
> > just needs to be implemented properly, which is an utter pain.

> In a real-world benchmark, I lost about 90 cycles for cmpxchg based lock+unlock.
> On an old P4 its arround a few hundred cycles ;)

But that's for a full mutex, right?  I just need one atomic operation,
not a whole critical section.  Cmpxchg might not be the right operation
- the RISC-oriented load/store-exclusive trick would be ideal.

In any case 90 cycles would *still* be substantially faster than
malloc() or free().  I gave up caring about P4s many years ago...

-- 
------
From: Jonathan Morton
      jonathan.morton at movial.com