[cairo] Fixing concurrency bugs in cairo's reference counting

Carl Worth cworth at cworth.org
Thu Dec 14 05:35:01 PST 2006


So, Monty went and demonstrated that people really haven't been
hitting cairo hard with multi-threaded programs all that much. Because
as soon as he did, he started crashing cairo left and right.

It turns out that cairo was usually kind enough to crash with an
assertion failure that steered Monty in the right direction. And being
the brilliant programmer that he is, he not only filed the bugs, but
he included patches for them as well. The issue is described here:

	text rendering lacking locking in multithreaded apps [PATCH]
	https://bugs.freedesktop.org/show_bug.cgi?id=8801

[Note: Monty created these patches on October 27, and I've been
wanting some people to discuss them ever since. Remember when I was
recently complaining about bugzilla? This is a great case in
point. What I would have loved was to receive an email with inline
patches on October 27 that I could just hit "reply" on and send that
out to everyone on the cairo mailing list. Or i would have loved it if
others had just seen this bug and maybe done something about it. As
things are now, look how long this bug has languished. It's
embarrassing!]

So, he definitely found real bugs, and his fixes definitely help his
program. There are a couple of reasons I haven't applied the patch
yet, and one of them I'd like some input on from the community.

First, there's the issue that the patch is fixing two separate bugs,
so it should be separated into two patches. (One bug is a piece of
code unprotected by _cairo_scaled_font_freeze_cache and the other bug
is missing protection around reference count manipulation.) There's
also a minor style issue of the patch introducing "refcount" where the
implementation currently has "ref_count". But those are both minor
things that Monty could fix, or I could clean up for him.

What I'd like help on is figuring out the right solution for fixing
the reference counting bug, (which is that the various object
reference counts can be manipulated concurrently and get out of
synch).. Monty's current patch addresses the problem by adding one
mutex per object type, (surface, pattern, etc.), and acquiring a lock
on that mutex around the manipulation of the reference count field in
both _reference() and _destroy().

That seems a correct approach, but it also seems a bit heavy to
me. I'm wondering if we couldn't get by with some sort of new macros,
(say CAIRO_ATOMIC_INCREMENT and CAIRO_ATOMIC_DECREMENT). Would
something like that be easy enough to implement on all relevant
platforms, and would it be a win over using CAIRO_MUTEX?

I _think_ that these would be all we would need to fix the current
bug. We do do other things with the reference count. For example, we
test it for the invalid value (as will occur in nil objects) before
incrementing, but that should be safe without additional locking since
an invalid value is permanent.

Similarly, we test the value for 0 after decrementing. It's harder to
guarantee that that's safe, but I think it is. The thing to look out
for is a race condition between acquiring a pointer to an object and
incrementing the reference count. If another thread can decrement the
reference counter to 0 during that window we lose. (And note that
Monty's mutex locks wouldn't even help here.) So I see three different
ways to acquire the pointer:

1. create()

	Obviously, there's no race possible here, as no other thread
	can obtain a pointer to the same object before we get ours,
	(and we don't even get it before the reference count is
	incremented).

2. from an "unshared" source, (another object that's referencing it
   already)

	Here, I'm thinking about something like the sequence:

		cairo_pattern_get_surface (pattern, &surface);
		cairo_surface_reference (surface);

	In this situation, there's definitely a window during which
	other threads might decrement the reference count of
	surface. But we know the count cannot go to 0 since the
	pattern we used to obtain the surface pointer must necessarily
	be holding one reference to the surface.

3. from a "shared" source, (a cache say)

	So this is the riskiest situation, and it does require locking
	to prevent a problem. And that's why we have locks protecting
	all lookups into caches.

So, like I said, I think all we need is atomic increment and
decrement. If anybody sees holes in what I described above, or can
recommend implementation strategies for atomic increment and decrement
on various platforms, that would be helpful.

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20061214/e5eb7f00/attachment.pgp


More information about the cairo mailing list