[cairo] Gtk performance issues from a user's point of view

Sun Oct 8 21:56:27 PDT 2006

Grr, I didn't realize the sysprof traces got so big. Sorry for that.
Resending to both lists with external links.

---------- Forwarded message ----------
From: Kalle Vahlman <kalle.vahlman at gmail.com>
Date: 8.10.2006 23:41
Subject: Re: Gtk performance issues from a user's point of view
To: cairo at cairographics.org, performance-list at gnome.org

(adding cairo list as the first issue is highly relevant there too)
(second issue is GTK+, about half email down)
(goodness, this got a bit more verbose than I intended :)

2006/9/29, Federico Mena Quintero <federico at ximian.com>:
> On Thu, 2006-09-28 at 20:13 +0200, Adalbert Dawid wrote:
>
> > 1. GtkTreeView's repaints are slow. This gets especially obvious, when you
> > perform one of the actions below:
> >  * Resize a column (i.e. change its width). You will notice that the
> >    header is badly lagging behind the mouse pointer, which gets worse as
> >    one enlarges the viewport.
> >  * Drag an icon from the Desktop over a big nautilus window with many
> >    files and directories in the list view mode. When you keep moving the
> >    icon over nautilus, the CPU goes up to 100% and the icon leaves an ugly
> >    white trail.
>
> What theme engine are you running?

This can indeed be a major factor when considering performance.

For example, here is some figures on a tester of mine with many
buttons and a big widget (which expands when resizing).

Here is a "plain" run with the builtin theme:

_gtk_marshal_BOOLEAN__BOXED        0,03  43,98
  scw_view_expose                  0,00  21,59
  gtk_label_expose                 0,00   6,77
  meta_frames_expose_event         0,00   5,99
  gtk_container_expose             0,08   2,92
  gtk_button_expose                0,05   2,45
...

So that looks about right I guess, ScwView renders a TreeModel with
lots of text and expands both horizontally and vertically where the
buttons only expand their width. So it's only natural for it to take
considerably more effort. The resizing was lagging a bit, but that's
also not news on my low-end laptop.

But, what happens with clearlooks? This:

gtk_marshal_BOOLEAN__BOXED         0,03  46,28
  gtk_button_expose                 0,03  24,88
  scw_view_expose                   0,00   8,96
...

Ok, the buttons are sweet, but not _that_ sweet :)
The resizing lags noticeably. Looking at what eats the most
percentages, I found the new cairo tesselator there:

_cairo_bentley_ottmann_tessellate_polygon                     0,58  20,68
  _cairo_bo_event_queue_insert_if_intersect_below_current_y   0,56  14,81
    _cairo_int128_divrem                                      0,19  13,22
      _cairo_uint128_divrem                                   4,79  11,75
        _cairo_uint128_rsl                                    1,92   1,92
        _cairo_uint128_lt                                     1,73   1,73
        _cairo_uint128_lsl                                    1,64   1,64
        _cairo_uint128_eq                                     0,86   0,86
        _cairo_uint128_add                                    0,39   0,39
        _cairo_uint128_sub                                    0,36   0,36
...

It seemed to matter very little even if I commented out the fancy
shadows and only drew a rounded rectangle (lines&arcs), the button
expose maintained its position as the major (application-side) CPU
eater.

The new tesselator was supposed to be up to four times faster than the
old one, but running the same test with the old one yields a different
result:

_gtk_marshal_BOOLEAN__BOXED         0,05  33,43
  gtk_button_expose                 0,00  10,91
  scw_view_expose                   0,02  10,13
  gtk_label_expose                  0,00   3,54
  meta_frames_expose_event          0,00   3,16

also the path to tesselation seem very different, but that's probably
to be expeceted...

So, now I toss the ball to Carl's corner; am I misinterpeting or is it
a regression in the tesselator? Do the traces look plausible?

The actual code that Clearlooks uses to draw is at

http://cvs.gnome.org/viewcvs/gtk-engines/engines/clearlooks/src/clearlooks_draw.c?view=markup

(clearlooks_draw_button)

but as I said, it seemed to matter only little what was drawn. The
checkouts from the new-tesselator2 and master branches for the tests
should be up-to-date as far as I know.

Files:

http://kalle.vahlman.googlepages.com/scw-row-test.png
 * A screenshot of the test app (available from the sources of Scw)

http://kalle.vahlman.googlepages.com/scw-row-test-clearlooks-profile.xml
 * Profile of resizing the window with clearlooks and new tesselator

http://kalle.vahlman.googlepages.com/scw-row-test-clearlooks-oldtesselator-profile.xml
 * Profile of resizing the window with clearlooks and old tesselator

[...]
> With my setup, an empty window resizes quite quicly.  It is a bit slower
> when the window gets close to full-screen (1400x1050), but I can't get
> it to lag or anything.
>
> It's interesting to note that if I add a single button to the window,
> then it gets noticeably slower (but it doesn't lag, either).  Then,
> Sysprof says that 69% of the time is spent in libfb/libxaa in the X
> server, not GTK+ itself.

My setup is slow enough to get real lagging ;) but the fbComposite
stuff is a big portion of my X percentage too, I'm just curious if
that is actually expected (and present with Qt etc). Can anyone
explain what they actually do and are they supposed to be doing it a
lot?-)

I found an issue with GTK+ and the expose events generated in response
to the configure event when I wrote a simple svg-loading app, disabled
double-buffering and loaded a complex-enough svg. It striked me as odd
how many times the svg was actually rendered in response to expose
events (ok, it could be rendered to a backbuffer but that's hardly the
point with resizing).

This is what happens with the two signals when I stretch the window
vertically in one swift go (the original size is 400x300):

configure(400x301+5+24)
expose(400x1+0+300)
expose(400x301+0+0)
configure(400x571+5+24)
expose(400x270+0+301)
expose(400x571+0+0)

First expose is 400x1 pixels at 300, ie. the first new row. I guess it
means that GTK+ reacts first and the configure-event-collapsing kicks
in only after that.

But the second expose pair shows something which feels to me as a
mistake. The first expose is for the new area of the window. This
makes a full redraw in my test case anyway, even if the clipping mask
is set. The second is for the full window area, which of course means
we'll draw it again making the first draw totally unneccesary.

I'm guessing this is so (assuming it's intentional) in order to make
the empty space fill up faster and only after that fill the rest. But
then it would make sense to get the later expose only to the part that
wasn't already updated, not to the whole widget.

And if you think about it, how many widgets actually are cabable of
drawing like that without drawing the whole thing twice?

FWIW, the test case I did was naturally considerably faster in
resizing after I rendered only in response to full exposes (but this
is of course not a solution as you need to refresh partly obscured
areas of the widgets).

IMO sending only a full expose would be the way to go here, but first
I need to find out the code that is causing this and testing without
the "extra" expose if it really makes a difference in real apps...
Anyone know it already?

-- 
Kalle Vahlman, zuh at iki.fi
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi