Hi Jonathan, > Okay, that's useful to know. An atomic-test-and-inc routine should be > roughly the same speed as a plain i++ on most modern hardware, it just > needs to be implemented properly, which is an utter pain. In a real-world benchmark, I lost about 90 cycles for cmpxchg based lock+unlock. On an old P4 its arround a few hundred cycles ;) - Clemens