[cairo] Cairo and layered application

Behdad Esfahbod behdad at behdad.org
Tue Mar 6 14:16:32 PST 2007


On Tue, 2007-03-06 at 14:43 -0600, Boris Zbarsky wrote:
> Behdad Esfahbod wrote:
> >> Are you generally interested in data of this sort?  If there's something I can 
> >> do to help improve cairo performance, I'd love to help.
> > 
> > Definitely.  Pass 'em on.  Specifically, I'm really interested in a call
> > graph to malloc.  There's a lot we can do to drop those.
> 
> I'll try to get together a more exhaustive list, but in the meantime 
> https://bugzilla.mozilla.org/show_bug.cgi?id=368018 is a decent example 
> (16% of total pageload time is under malloc/free, most of it called from 
> cairo).  Most of the other profiles I've seen look much like this, so it 
> may be worth addressing this one and then reprofiling.
>
> I don't have a good way to generate pictorial call graphs, but as far as 
> cairo calls to malloc in that testcase go (note that some of the 
> callstacks might have stack frames skipped due to the profiler being 
> used...):

Thanks.  The list really helps.  Fortunately all look very easy to
optimize as I expected.  I just need a test case that gives
malloc/realloc numbers so I can measure my progress.

One way would be to write a small .so that overrides malloc/realloc, and
run perf suite against it.  If we were using glib, such stats were
free...  Anyway, anyone taking this?

Per-item plans follow.

> malloc
>    
>      cairo_path_fixed_add
>        cairo_path_fixed_move_to
>          cairo_move_to
>            cairo_rectangle
>          cairo_path_fixed_close_path
>            cairo_close_path
>              cairo_rectangle
>        cairo_path_fixed_line_to
>          cairo_path_fixed_rel_line_to
>            cairo_rectangle

Explanation: currently _cairo_path_fixed has pointers to two lists: one
for operations, another for points.  Each list is a linked list of
chunks each holding 64 items.  This means, any path (even with a single
move-to) will cause at least three malloc()s.  One for the path struct,
two for the list nodes.

Easy-fix: make _cairo_path_fixed keep a node of those two lists, instead
of pointers.

More work can be done still: for example, the current approach of const
number of items per chunk has a constant malloc() overhead (two mallocs
per 64 operations).  One can make the size of chunks grow exponentially
to reduce the overall overhead.


>    realloc
>      cairo_polygon_add_edge
>        cairo_polygon_line_to
>          cairo_filler_line_to
>            cairo_path_fixed_interpret
>              cairo_path_fixed_fill_to_traps
>                cairo_surface_fallback_fill
>      cairo_traps_grow_by
>        cairo_traps_add_trap
>          cairo_traps_add_trap_from_points
>            cairo_bentley_ottmann_tessellate_polygon
>            cairo_bo_edge_end_trap
>              cairo_bentley_ottmann_tessellate_polygon

Both realloc an array to fit more items, but oops!  they just enlarge it
enough to fit the items at hand (I may be wrong, but I don't think I
am.)  That means, at some point, they will realloc on any operation!

Easy-fix: exponentially enlarge them.


>    cairo_bentley_ottmann_tessellate_polygon
>      cairo_path_fixed_fill_to_traps
>        cairo_surface_fallback_fill

Allocates an array at the beginning of function and frees at the end.

Easy-fix: use an stack buffer of size CAIRO_STACK_BUFFER_SIZE bytes and
use it when data fits.


>    cairo_pattern_create_solid
>      cairo_pattern_create_rgba
>        cairo_set_source_rgba

Does a malloc() every time you cairo_set_source_rgba?().

Medium-fix: instead of solid-surface cache, cache solid patterns, like
we do for scaled-fonts.

Solid-surface caching still should be done the hard way because of the
xlib threading problems we recently found.  In fact the code for
solid-surface caching may be adapted for solid-pattern caching without
any problems.  It doesn't even have to check for backend compatibility.


>    skip_list_insert
>      cairo_bentley_ottmann_tessellate_polygon

This one's harder.  It already uses a free-list.  One possible solution
may be 1) allocate in chunks, 2) be more relaxed about using available
chunks that are slightly larger than needed.


>    cairo_path_op_buf_create
>      cairo_path_fixed_add
>        cairo_path_fixed_move_to
>          cairo_move_to
>            cairo_rectangle
>          cairo_path_fixed_close_path
>            cairo_close_path
>              cairo_rectangle
>        cairo_path_fixed_line_to
>          cairo_path_fixed_rel_line_to
>            cairo_rectangle

Covered with cairo_path_fixed_add() already.


>    cairo_surface_fill_region
>      clip_and_composite_trapezoids
>        cairo_surface_fallback_fill

Easy-fix: Use a stack buffer.


>    cairo_freelist_alloc
>      cairo_bentley_ottmann_tessellate_polygon

Another freelist implementation, oops!  Again, make it allocate in
chunks.  That's not a problem at all, since the memory is very
shortlived.


>    pixman_region_create_simple
>      pixman_region_create
>        cairo_traps_extract_region
>        clip_and_composite_trapezoids
>          cairo_surface_fallback_fill
>      cairo_traps_extract_region
>        clip_and_composite_trapezoids
>          cairo_surface_fallback_fill

Is allocating small structures using malloc().  Most uses can do with a
local one on stack.

Easy-fix: Change pixman to expose pixman_region_initialize() and
allocate region on stack when possible.


>    cairo_pattern_create_rgba
>      cairo_set_source_rgba

Inlined version of _cairo_pattern_create_solid() covered above.


> In all cases, callers are listed in order from most common to least common.
> 
> -Boris

If people want to beat me and submit patches for these, I'M ALL FOR IT.
But we first need a quantitative test case.

Cheers,
-- 
behdad
http://behdad.org/

"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759





More information about the cairo mailing list