[cairo] Cairo and layered application
behdad at behdad.org
Tue Mar 6 14:16:32 PST 2007
On Tue, 2007-03-06 at 14:43 -0600, Boris Zbarsky wrote:
> Behdad Esfahbod wrote:
> >> Are you generally interested in data of this sort? If there's something I can
> >> do to help improve cairo performance, I'd love to help.
> > Definitely. Pass 'em on. Specifically, I'm really interested in a call
> > graph to malloc. There's a lot we can do to drop those.
> I'll try to get together a more exhaustive list, but in the meantime
> https://bugzilla.mozilla.org/show_bug.cgi?id=368018 is a decent example
> (16% of total pageload time is under malloc/free, most of it called from
> cairo). Most of the other profiles I've seen look much like this, so it
> may be worth addressing this one and then reprofiling.
> I don't have a good way to generate pictorial call graphs, but as far as
> cairo calls to malloc in that testcase go (note that some of the
> callstacks might have stack frames skipped due to the profiler being
Thanks. The list really helps. Fortunately all look very easy to
optimize as I expected. I just need a test case that gives
malloc/realloc numbers so I can measure my progress.
One way would be to write a small .so that overrides malloc/realloc, and
run perf suite against it. If we were using glib, such stats were
free... Anyway, anyone taking this?
Per-item plans follow.
Explanation: currently _cairo_path_fixed has pointers to two lists: one
for operations, another for points. Each list is a linked list of
chunks each holding 64 items. This means, any path (even with a single
move-to) will cause at least three malloc()s. One for the path struct,
two for the list nodes.
Easy-fix: make _cairo_path_fixed keep a node of those two lists, instead
More work can be done still: for example, the current approach of const
number of items per chunk has a constant malloc() overhead (two mallocs
per 64 operations). One can make the size of chunks grow exponentially
to reduce the overall overhead.
Both realloc an array to fit more items, but oops! they just enlarge it
enough to fit the items at hand (I may be wrong, but I don't think I
am.) That means, at some point, they will realloc on any operation!
Easy-fix: exponentially enlarge them.
Allocates an array at the beginning of function and frees at the end.
Easy-fix: use an stack buffer of size CAIRO_STACK_BUFFER_SIZE bytes and
use it when data fits.
Does a malloc() every time you cairo_set_source_rgba?().
Medium-fix: instead of solid-surface cache, cache solid patterns, like
we do for scaled-fonts.
Solid-surface caching still should be done the hard way because of the
xlib threading problems we recently found. In fact the code for
solid-surface caching may be adapted for solid-pattern caching without
any problems. It doesn't even have to check for backend compatibility.
This one's harder. It already uses a free-list. One possible solution
may be 1) allocate in chunks, 2) be more relaxed about using available
chunks that are slightly larger than needed.
Covered with cairo_path_fixed_add() already.
Easy-fix: Use a stack buffer.
Another freelist implementation, oops! Again, make it allocate in
chunks. That's not a problem at all, since the memory is very
Is allocating small structures using malloc(). Most uses can do with a
local one on stack.
Easy-fix: Change pixman to expose pixman_region_initialize() and
allocate region on stack when possible.
Inlined version of _cairo_pattern_create_solid() covered above.
> In all cases, callers are listed in order from most common to least common.
If people want to beat me and submit patches for these, I'M ALL FOR IT.
But we first need a quantitative test case.
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the cairo