[cairo] PATCH: GL: Create glyph mask surface only once per context, enlarge on demand

Thu Apr 25 00:46:41 PDT 2013

On Wed, Apr 24, 2013 at 01:42:04PM -0700, Martin Robinson wrote:
> On 02/16/2013 02:12 AM, Chris Wilson wrote:
> > Because I want to promote better drivers. Working around issues that can
> > be easily fixed in the driver for the betterment of all, just leads to
> > stagnation as nobody feels the impetus to do anything.
> 
> It's always good to have numbers in discussions like these. I wrote a
> tiny HTML5 canvas micro test [1] that just renders text (with
> overlapping glyphs which trigger the mask path) as fast as possible over
> a period of 3 seconds. The test measures number of renders per second
> and takes 10 samples. I repeated this test on two systems, one with
> Intel hardware and one with an NVidia hardware [2]. I'm willing to
> collect more data (AMD and perhaps a Mali device) if it's interesting.
> 
> (units are text paints per millisecond)
> 
> image-nvidia: 55.1969049523432 (stddev: 0.878697193173796)
> image-intel: 64.1527714098772 (stddev: 0.234850204049565)
> gl-nvidia (before patch) 0.233834194694391 (stddev: 0.002959739753409)
> gl-nvidia (after patch): 17.2066024223006 (stddev: 0.424291496997496)
> gl-intel (before patch): 13.9265171859729 (stddev: 0.175320467059827)
> gl-intel (after patch): 17.1163946835002 (stddev: 0.157778076619366)

Hmm, can you remeasure using perf/cairo-perf-micro glyphs so that we
have a common baseline? At that point, you'll probably want to write a
language to specify which micro benchmarks you want to run...

> So while on Intel hardware it is nowhere near as dramatic as on NVidia,
> there still seems to be some improvement.

Not really, as far as I am concerned, for Intel creating a buffer is
roughly the same cost as a mmap - so we over 3x slower than what is
demonstrably possible.

> Perhaps a more abstract framebuffer caching mechanism would be less
> ugly? cairo-traces show a lot of calls to cairo_push_group and/or
> cairo_create_similar (and then a quick destruction). This essentially
> makes the GL backend of Cairo unsuitable for toolkit rendering, as the
> overhead for creating and binding a new framebuffer runs the gamut from
> slightly expensive to outlandish.

Again speaking from experience, that level of GPU surface caching is much
better handled in the lowlevel driver as it dramatically benefits from
understanding of GPU state.

> As an aside, no matter what, avoiding the mask path is always much, much
> faster (~30 r/ms for Intel and ~40 r/ms for NVidia), so we should
> probably be trying harder to avoid it. I don't think the pixels of Times
> New Roman in this string actually overlap.

One thing to note there, is if the font is encoded as fontconfig pattern
rather than as an embedded truetype font, there may be discrepancies
between the metrics used to generate the glyph position during record
and replay - we may be shooting ourselves in the foot. Hmm, wish I'd
thought of that when recording the traces.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre