[cairo] Malloc profiler/callgraph
chris at chris-wilson.co.uk
Sun Mar 11 18:09:27 PDT 2007
Recently, Behdad has turned his attention to reducing the number of
allocations Cairo makes. In order to measure his progress, he wrote a
tool to hook into malloc and record the callers. Unfortunately in order
to get the best results, he needed to modify the source. As an
alternative, I present this valgrind skin. It is mostly based on the
massif skin, in that it overloads the mallocfree functions and records
the entire stacktrace and accumulates statistics for each unique trace.
At the end it will print a table of the allocators (the function that
called malloc, or rather the first function not listed among --alloc-fn
ala massif) and it will dump out the unique stack traces to a file. At
the moment, I have not translated this output into any common format (I
was thinking of writing it in a callgrind.out format so as to use it in
kcachegrind) and instead include a very simple mp-gui.py to read in the
stack traces and provide a means of reviewing the results.
The patch is relative to valgrind's svn trunk. Apply, reconfigure and
make install. Usage is similar to other valgrind skins:
$ valgrind --tool=memprof --help
$ valgrind --tool=memprof ./cairo-perf
And the output is:
==18877== 216 distinct allocators.
==18877== nBlocks nBytes nReallocs Lifespan (ms)
==18877== 484,619 1,030,781,440 0 1 _cairo_traps_add_trap_from_points [cairo-traps.c::193]
==18877== 528,888 21,155,520 0 0 _cairo_pixman_format_create_masks [icformat.c::102]
==18877== 528,916 69,816,912 0 62 pixman_image_createForPixels [icimage.c::76]
==18877== 598,300 4,786,400 0 2 _cairo_freelist_alloc [cairo-freelist.c::52]
==18877== 967,584 290,275,200 0 2 _cairo_path_fixed_move_to [cairo-path-fixed.c::199]
==18877== 1,408,396 361,163,776 0 0 _cairo_spline_add_point [cairo-spline.c::110]
==18877== 1,763,825 32,374,496 0 2 skip_list_insert [cairo-skiplist.c::293]
==18877== 10,943,515 4,330,076,529 0 145 (total)
The downside to this tool is that it incurs an order of magnitude
performance overhead, which is a nuisance as before it extracted the
stack for each unique callsite it was only about a factor of 3-4 slower.
I hope you find this a useful little tool.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 9028 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20070312/5bb0fafd/vg-memprof.patch.obj
More information about the cairo