[cairo] cairo-perf-trace may be stopping the timer too early for xlib backend?

Sun Jan 27 20:33:33 PST 2013

Hello,

I tried to benchmark some Xorg drivers on an ARM embedded system with
cairo-perf-trace using xlib backend and got somewhat weird results.

First using the original "firefox-chalkboard" trace as a reference:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]     xlib           firefox-chalkboard 1527.617 1527.617   0.00%    1/1

real	25m28.359s
user	2m15.020s
sys	0m41.710s

There is nothing wrong with the result above, except that the
time needed to run this benchmark is ridiculously high. It is
one of the reasons why I'm actually using trimmed traces (done
with "./csi-trace --trim") from the following repository:

    http://cgit.freedesktop.org/~siamashka/trimmed-cairo-traces/

And looking at the result for the trimmed trace we have a problem:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]     xlib         t-firefox-chalkboard    7.313    7.313   0.00%    1/1

real	0m21.237s
user	0m0.200s
sys	0m0.060s

Here cairo-perf-trace reports the time, which does not match reality by
a huge margin. Digging a bit in the sources reveals that the extra time
is spent doing "target->cleanup (args.closure)" here:

    http://cgit.freedesktop.org/cairo/tree/perf/cairo-perf-trace.c?id=1.12.8#n767

This cleanup is done after the timer has been stopped already. If we
start profiling approximately after 10-th second, it shows that there
are some real compositing operations still being performed in Xorg
process, and not just pixmaps destruction:

    33.09%     Xorg  libpixman-1.so.0.28.2  [.] bits_image_fetch_general                               
    17.41%     Xorg  libpixman-1.so.0.28.2  [.] __divdi3                                               
    10.06%     Xorg  libpixman-1.so.0.28.2  [.] __udivsi3                                              
     8.96%     Xorg  libpixman-1.so.0.28.2  [.] __gnu_ldivmod_helper                                   
     7.54%     Xorg  libpixman-1.so.0.28.2  [.] __aeabi_uidivmod                                       
     4.61%     Xorg  libpixman-1.so.0.28.2  [.] __aeabi_ldivmod                                        
     3.30%     Xorg  libpixman-1.so.0.28.2  [.] pixman_composite_src_8888_8888_asm_neon                
     3.07%     Xorg  libpixman-1.so.0.28.2  [.] pixman_composite_scanline_over_asm_neon                
     1.83%     Xorg  libpixman-1.so.0.28.2  [.] fetch_pixel_a8r8g8b8                                   
     1.66%     Xorg  [kernel.kallsyms]      [k] __memzero                                              
     1.36%     Xorg  [kernel.kallsyms]      [k] v7_dma_inv_range                                       
     1.27%     Xorg  [kernel.kallsyms]      [k] v7_dma_clean_range                                     
     1.02%     Xorg  libpixman-1.so.0.28.2  [.] pixman_composite_src_n_8888_asm_neon                   
     0.40%     Xorg  libpixman-1.so.0.28.2  [.] linear_get_scanline_narrow                             
     0.33%     Xorg  [kernel.kallsyms]      [k] __do_softirq                                           

Modifying cairo-perf-trace to stop the timer only after the cleanup
seems to fix the problem:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]     xlib         t-firefox-chalkboard   21.413   21.413   0.00%    1/1

real	0m21.618s
user	0m0.180s
sys	0m0.160s

The same also happens on my desktop system with open source radeon
driver. Running the original cairo-perf trace:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]     xlib         t-firefox-chalkboard    0.342    0.347   3.44%    9/10

real	0m9.334s
user	0m0.260s
sys	0m0.030s

And now the patched cairo-perf-trace:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]     xlib         t-firefox-chalkboard    0.900    0.916   0.87%    9/10

real	0m9.225s
user	0m0.240s
sys	0m0.040s

Seems like X server can simply buffer a lot of requests, causing slight
distortion of the benchmark results for long traces and totally bogus
results for short ones. Something like the following fix might be
needed (that is if the time for "target->cleanup (args.closure)" is
really not supposed to be measured):

    http://cgit.freedesktop.org/~aplattner/xrenderbenchmark/commit/?id=8a18efed83ba

-- 
Best regards,
Siarhei Siamashka