[cairo] Performance stroking lines

Thu Nov 15 13:18:18 PST 2007

On Thu, 2007-11-15 at 12:12 -0800, Dan Amelang wrote:
> On Nov 14, 2007 2:35 PM, Peter Clifton <pcjc2 at cam.ac.uk> wrote:
> > Hi,
> > ... (lots of description and data about a performance problem) ...
> 
> Hi Peter, thanks for gathering all this data for us to work with.
> 
> Unfortunately, from your profile, I don't see any obvious, low-hanging
> fruit for improvement. At the top of your profile, you have basically
> the cairo/X rasterization and compositing/fill functions, which is
> good in the sense that your program is spending time doing "real work"
> and not doing "needless overhead".

Unfortunately, this seemed the case to me too.

> Speeding up rasterization and/or compositing any large amount would
> entail a significant amount of work (there has been talk about this
> recently on the list). And from what I get your email, you probably
> aren't very interested in hardware acceleration, since you mention
> that your users often have not-so-powerful hardware.

Yes, indeed.. although I've realised  one possible workaround. Damon
Chaplin tells me performance will be near GDK levels if I disable
anti-aliasing (shame, as this was why I wanted Cairo)...

But it does give a route which might allow a cairo port to go upstream
in gEDA, so long as users who it hits with performance issues can switch
off the anti-aliasing. (Or a mid-level, automatically switch it off
whilst dragging components about).

> So, the fall-back answer is to find some way to do less work (avoid
> unecessary work, cache the results of previous work, etc). It might
> help to share with us some of your drawing code. Maybe we can spot
> areas where this can be done. One can often cache rendering results on
> a separate image surface and just re-composite that surface each time
> you draw (e.g. the rendering for a whole "component" could be rendered
> and cached on an off-screen surface).

Its all pretty vile, as I'm still in the process of trying to rip out /
comment out the old (complex) drawing code, and change to a more Cairo
like model.

I'm taking this to mean 1: "Invalidate, then draw in the expose handler"
as opposed to 2: "draw when we feel like", or 3: "draw in a backbuffer
when we feel like it, then invalidate / blit the screen"

(The old code does 2, I was in the process of changing it to 3 as a way
to avoid horrendous performance hits when using a composited desktop,
then I started looking at 1, and cairo).

The work in progress might be seen here:
http://repo.or.cz/w/geda-gaf/pcjc2.git?a=shortlog;h=cairo_experiment

(Although the conversion to attempting cairo is neither pretty or
direct!)

If anyone gets to looking.. the redraw function called at expsoe doesn't
actually get to hunting down objects which are in the exposed region,
all objects are redrawn. (I haven't got to writing the spatial
data-structures for bounds-testing objects yet). I'm testing with only a
"titleblock" (some boxes, lines, and text), and one component - which
I'm dragging around - so in theory, the gain shouldn't be too much.

> Some strange things I saw in your oprofile reports (which are likely
> just my own confusion...oprofile reports have confused me in the
> past):

oprofile confuses the heck out of me. I'm not sure for certain, but it
may be that this aggregates CPU time burnt. It looks plausible from the
numbers below - although I haven't yet tried adding them all up to see.
I tried asking on the oprofile IRC, but didn't get a reply in the time
I was able to hang around there.

> > (GDK Double buffering on, cairo image surface)
> >
> > 90623    21.7340  processor                processor                (no symbols)
> > 24089     5.7772  libfb.so                 Xorg                     fbCopyAreammx
> > 16343     3.9195  libfb.so                 Xorg                     fbRasterizeEdges
> > 16316     3.9130  libfb.so                 Xorg                     fbCompositeSolidMask_nx8x8888mmx
> 
> Why would your machine be spending all that time rasterizing and
> compositing in the X server if you are using an image surface for
> rendering?

Pass. I'll try again, making sure it really has deleted the old
profiles, and that all old GDK calls are commented out. (I think I did
that already though).

> In addition, on all of your profiles, the total % for cairo/X lies
> between ~5-12%. Again, it's probably my own oprofile ignorance, but
> might that point to something else being the bottleneck? That 40%
> samples in "processor" makes me a little nervous...

I'll try sysprof again. It didn't seem that it'd be so useful to post
info from, as its a GUI. I'll investigate if it dumps log files too. The
main problem was that it didn't seem to profile within the kernel - and
that showed up a reasonable amount in the profiling. Using oprofile, it
didn't show up as much (presumably since its not one single function
taking all the time).

> It also might be useful to generate one of the other types of oprofile
> reports, I forgot what all are available and what they do, but a
> different view/angle on the data might help.

If someone familiar with oprofile can direct me, I can make tests.

I guess (when I get some time), the next useful steps will be to distill
a rendering only test, comparable with the workload gschem is giving it
- and to drive for some kind of frames per second metric. (I hope this
ought to give more repeatable numbers than my "wiggling mouse" test).

Thanks for looking at this,

Best wishes,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)