[cairo] Cairo and layered application
Behdad Esfahbod
behdad at behdad.org
Tue Mar 6 14:16:32 PST 2007
On Tue, 2007-03-06 at 14:43 -0600, Boris Zbarsky wrote:
> Behdad Esfahbod wrote:
> >> Are you generally interested in data of this sort? If there's something I can
> >> do to help improve cairo performance, I'd love to help.
> >
> > Definitely. Pass 'em on. Specifically, I'm really interested in a call
> > graph to malloc. There's a lot we can do to drop those.
>
> I'll try to get together a more exhaustive list, but in the meantime
> https://bugzilla.mozilla.org/show_bug.cgi?id=368018 is a decent example
> (16% of total pageload time is under malloc/free, most of it called from
> cairo). Most of the other profiles I've seen look much like this, so it
> may be worth addressing this one and then reprofiling.
>
> I don't have a good way to generate pictorial call graphs, but as far as
> cairo calls to malloc in that testcase go (note that some of the
> callstacks might have stack frames skipped due to the profiler being
> used...):
Thanks. The list really helps. Fortunately all look very easy to
optimize as I expected. I just need a test case that gives
malloc/realloc numbers so I can measure my progress.
One way would be to write a small .so that overrides malloc/realloc, and
run perf suite against it. If we were using glib, such stats were
free... Anyway, anyone taking this?
Per-item plans follow.
> malloc
>
> cairo_path_fixed_add
> cairo_path_fixed_move_to
> cairo_move_to
> cairo_rectangle
> cairo_path_fixed_close_path
> cairo_close_path
> cairo_rectangle
> cairo_path_fixed_line_to
> cairo_path_fixed_rel_line_to
> cairo_rectangle
Explanation: currently _cairo_path_fixed has pointers to two lists: one
for operations, another for points. Each list is a linked list of
chunks each holding 64 items. This means, any path (even with a single
move-to) will cause at least three malloc()s. One for the path struct,
two for the list nodes.
Easy-fix: make _cairo_path_fixed keep a node of those two lists, instead
of pointers.
More work can be done still: for example, the current approach of const
number of items per chunk has a constant malloc() overhead (two mallocs
per 64 operations). One can make the size of chunks grow exponentially
to reduce the overall overhead.
> realloc
> cairo_polygon_add_edge
> cairo_polygon_line_to
> cairo_filler_line_to
> cairo_path_fixed_interpret
> cairo_path_fixed_fill_to_traps
> cairo_surface_fallback_fill
> cairo_traps_grow_by
> cairo_traps_add_trap
> cairo_traps_add_trap_from_points
> cairo_bentley_ottmann_tessellate_polygon
> cairo_bo_edge_end_trap
> cairo_bentley_ottmann_tessellate_polygon
Both realloc an array to fit more items, but oops! they just enlarge it
enough to fit the items at hand (I may be wrong, but I don't think I
am.) That means, at some point, they will realloc on any operation!
Easy-fix: exponentially enlarge them.
> cairo_bentley_ottmann_tessellate_polygon
> cairo_path_fixed_fill_to_traps
> cairo_surface_fallback_fill
Allocates an array at the beginning of function and frees at the end.
Easy-fix: use an stack buffer of size CAIRO_STACK_BUFFER_SIZE bytes and
use it when data fits.
> cairo_pattern_create_solid
> cairo_pattern_create_rgba
> cairo_set_source_rgba
Does a malloc() every time you cairo_set_source_rgba?().
Medium-fix: instead of solid-surface cache, cache solid patterns, like
we do for scaled-fonts.
Solid-surface caching still should be done the hard way because of the
xlib threading problems we recently found. In fact the code for
solid-surface caching may be adapted for solid-pattern caching without
any problems. It doesn't even have to check for backend compatibility.
> skip_list_insert
> cairo_bentley_ottmann_tessellate_polygon
This one's harder. It already uses a free-list. One possible solution
may be 1) allocate in chunks, 2) be more relaxed about using available
chunks that are slightly larger than needed.
> cairo_path_op_buf_create
> cairo_path_fixed_add
> cairo_path_fixed_move_to
> cairo_move_to
> cairo_rectangle
> cairo_path_fixed_close_path
> cairo_close_path
> cairo_rectangle
> cairo_path_fixed_line_to
> cairo_path_fixed_rel_line_to
> cairo_rectangle
Covered with cairo_path_fixed_add() already.
> cairo_surface_fill_region
> clip_and_composite_trapezoids
> cairo_surface_fallback_fill
Easy-fix: Use a stack buffer.
> cairo_freelist_alloc
> cairo_bentley_ottmann_tessellate_polygon
Another freelist implementation, oops! Again, make it allocate in
chunks. That's not a problem at all, since the memory is very
shortlived.
> pixman_region_create_simple
> pixman_region_create
> cairo_traps_extract_region
> clip_and_composite_trapezoids
> cairo_surface_fallback_fill
> cairo_traps_extract_region
> clip_and_composite_trapezoids
> cairo_surface_fallback_fill
Is allocating small structures using malloc(). Most uses can do with a
local one on stack.
Easy-fix: Change pixman to expose pixman_region_initialize() and
allocate region on stack when possible.
> cairo_pattern_create_rgba
> cairo_set_source_rgba
Inlined version of _cairo_pattern_create_solid() covered above.
> In all cases, callers are listed in order from most common to least common.
>
> -Boris
If people want to beat me and submit patches for these, I'M ALL FOR IT.
But we first need a quantitative test case.
Cheers,
--
behdad
http://behdad.org/
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the cairo
mailing list