[cairo] Using cairo/pixman for raw video in GStreamer

Wed Sep 30 07:06:04 PDT 2009

So, here is an update on what happened on this in the last 3 weeks.
(Warning: It might get quite in depth in both Cairo and GStreamer
terminology, so if you don't know about some things I'm talking about,
don't hesitate to ask me about it in a reply or on IRC. I'll probably
assume more than rudimentary GStreamer and Cairo knowledge in this
mail. I want to keep it short and to the point.)
When I'm talking about test results, those were created on a Macbook
2.2 with an Intel 945 GPU on Karmic. Don't expect this to be as
performant on old hard-/software. I would however expect it to be as
performant on recent X servers with Intel and Radeons. But you have
been warned. :)

What have I done so far?

I've written code to implement my ideas. The code exists in public git
branches and is expected to work, should you want to test it. The code
should compile fine on any somewhat recent distro. Read: If you can
compile git master of gstreamer and cairo, you can compile this code.
Of course, it's alpha quality, so expect it to change quickly. But it
should definitely compile and run.

pixman:
http://cgit.freedesktop.org/~company/pixman/log/?h=yuv
I added support for most YUV formats that GStreamer supports today.
The missing ones weren't added because they weren't necessary to prove
my point. I also enhanced the API to allow creating planar images. The
code is not yet optimized in any way, but I intend to hook in David's
ORC code to accelerate common YUV operations.

cairo:
http://cgit.freedesktop.org/~company/cairo/log/?h=yuv
I exported one function to be able to use any pixman format and be
able to create planar image surfaces. A bunch of bugfixes were
necessary, too. They're all landed in git master though.

gst-plugins-cairo:
http://cgit.freedesktop.org/~company/gst-plugins-cairo
This new repository contains a library libgstcairo and a bunch of
plugins using that library.
The library does three things. First it abstracts the caps handling.
This allows adding new caps to the library without the need to update
the elements. It also adds a bunch of support functions that make
writing caps nego code a lot simpler. Second, it contains code to
create cairo surfaces from GstBuffers and vice versa. And last but not
least it introduces a new format "video/x-cairo" that allows passing
cairo surfaces in buffers. As this is all done transparently, elements
will render to GL or whatever surfaces the moment they become
available to libgstcairo without the need to recompile them.
The elements implement the functionality of the most common GStreamer
raw video elements. So far, there are (in order of creation and with
the elements they're intended to replace):
- cairocolorspace (ffmpegcolorspace)
- puzzle
- cairotestsrc (videotestsrc)
- pangotimeoverlay (timeoverlay)
- cairoxsink (ximagesink/xvmagesink)
- cairoscale (videoscale)
I'll code more elements and implement features for the current ones as
I get around to it. In particular a full videomixer and textoverlay
replacement are on my list.
These elements are parrallel-installable to curent GStreamer elements;
they will not override any existing elements.

What did I learn so far?

Cairo looks like the perfect match for GStreamer video handling, even
when talking about memory buffers only. I was surprised at how quickly
I could achieve progress and that there are no features I had to leave
behind while porting elements. In fact, elements gained features
because they support more video/x-raw-* formats now than they did
before. Also, the code required got a LOT smaller. Most current
elements duplicate the code to handle formats (wc -l for elements:
ffmpegcolorspace: 7400, videoscale: 4600, videotestsrc: 3250) while
gst-cairo hooks into the Cairo code with little overhead (libgstcairo:
1100 lines, cairocolorspace: 150, cairotestsrc: 900, cairoscale: 170).
So we can talk about orders of magnitude of code that gets saved while
not losing features.
The performance when compared wth the default elements is somewhere
between 5x slower and 3x faster, depending on what one is doing and
without me having done any optimizations. I expect performance to be
at least equal to current code but likely better once all
optimizations are hooked up. So much for backwards compatibility. (So
much for Cairo being "slow", when it can beat GStreamer noticably in
some cases. ;))

The interesting thing is video/x-cairo. This can allow running a whole
pipeline on the GPU without any need to move the data in main memory.
When it works, its performance improvements can be measured in orders
of magnitude again. A simple example (real/user times from "time",
gst-launch pipeline used):
8.631s - 4.712s - videotestsrc num-buffers=1000 !
video/x-raw-yuv,width=800,height=600 ! xvimagesink sync=false
6.581s - 4.564s - videotestsrc num-buffers=1000 !
video/x-raw-rgb,width=800,height=600 ! ximagesink sync=false
0.632s - 0.488s - cairotestsrc num-buffers=1000 !
video/x-cairo,width=800,height=600 ! cairoxsink sync=false

Or a somewhat more demanding example:
18.843s - 15.585s - videotestsrc num-buffers=1000 ! timeoverlay !
video/x-raw-yuv,width=800,height=600 ! xvimagesink sync=false
21.552s - 17.237s - videotestsrc num-buffers=1000 ! timeoverlay !
video/x-raw-rgb,width=800,height=600 ! ximagesink sync=false
1.187s - 0.668s - cairotestsrc num-buffers=1000 ! pangotimeoverlay !
video/x-cairo,width=800,height=600 ! cairoxsink sync=false

Getting these performance gains requires access to hardware buffers in
the whole pipeline. And the current design of hardware access
libraries (both GL and X) and GStreamer doesn't make it any easier.
Which brings us to the next point:

What are the remaining issues?

First of all: For memory buffers there are no remaining issues. You
can probably use cairocolorspace and cairoscale as drop-in
replacements without any issues today.
With that said, the one big remaining issue is: Get things reliably
hardware-accelerated. There's a noticable difference between all
elements being accelerated and all but one element being accelerated.
While the code falls back to software rendering whenever something is
not supported - so there's no internal flow errors or even crashes -
but it's a performance difference. Usually it's the difference between
no CPU usage and one busy core. (on a lighter note, with a CPU meter
it's easy to detect if the whole pipeline is properly accelerated.)
Here's a list of isues I'm facing (from higher to lower layer):

- GStreamer threading
Code that involves GstBuffers can be called pretty much by any thread
at any time - to be exact: buffers can be read by multiple threads,
but only one thread at a time may write to it. This thread however may
change. So there is a challenge in making sure that cairo surfaces
that are kept inside buffers don't step on each other's toes from
multiple threads. This is not an issue with image surfaces, but it is
an issue with at least GL and X. Not sure about DirectFB, DirectX or
DRM, but I'd suspect they have similar issues.
Getting this right is possible today, but it would be nice if
GStreamer would tell buffers when they are passing a thread boundary.
I suspect this is not easy to do before 0.11 though.

- GStreamer buffer allocation
Buffer allocation code has quite some places where it simply returns a
memory buffer when caps do not match. While gstcairo handles this fine
and falls back to software rendering, it'll get slow. So making sure
that in a pipeline like
cairotestsrc ! video/x-cairo,width=800,height=600 ! cairoscale !
video/x-cairo,width=600,height=450 ! cairoxsink
the cairotestsrc can allocate a 800x600 buffer from cairoxsink is desirable.
While some ideas exist on how to make this work, there's no supported
way of making it happen.

- Cairo meta surfaces copy whole surfaces
One way of solving the aforementioned issue and a nice way to handle
the threading issues listed above is to use meta surfaces and only
replay them in the sink element. Unfortunately, Cairo copies image
buffers when generating snapshots, which kills performance right
there. It would be nice if there was a way to keep surfaces unchanged
until they are modified (copy in begin_modification maybe). If I
comment out that code, this solution works fast today.
It has one drawback though: As meta surfaces keep references to source
surfaces around, we'd need to have threadsafe source surfaces. So we'd
either need to not use GL/X surfaces at all (and rely on meta surfaces
only) or find a way to copy the surfaces when moving over thread
boundaries.

- Rendering to subsampled images
I didn't spend a lot of time after finding an ok solution, but it's a
challenging task to support rendering to vertically subsampled images
with pixman's scanline based approach. On one hand this is a pretty
futile attempt, as the subsampling will result in artifacts no matter
what one does, but on the other hand it'd be nice if rendering would
work, so one could support subtitles or even overlays as seen on TV.
And the most-used YUV format (I420/YV12) is horizontally subsampled.
It's not terribly important as in most cases conversion can be done as
the last step, but if somebody has a solution to the problem, I'm all
ears.

- Ways to upload YUV data
There is currently no good way to get an accelerated upload of YUV
data to X. The only ways I'm aware of are GL (see gst-plugins-gl for
code) and xv. So either we'll need to continue converting to RGB in
the X case and focus on using cairo-gl, make cairo use xv or convince
the X people that such a thing as YUV uploads would be a welcome
addition to Xrender.
It seems X people are currently at XDC getting all excited about
Wayland, so I don't have very high hopes.

- Using the GPU's video decoding abilities
As GPUs can decode videos in all the recent formats, it would be nice
if there could be elements that take raw MPEG, H264 or whatever frames
and stuck the result into a cairo buffer, preferrably in the GPU. This
would also get around the issue above for most videos people watch
today.
Unfortunately it seems the people involved in these projects haven't
yet figured out if they want to name it vaapi, vdpau, xvba or xvmc
(alphabetical order here, I have no preferences), what capitalization
scheme they want to use or if they even want to make it open source.
So it seems this is going nowhere in the forseeable future, too.

What now?

When talking on IRC about this I realized that there is quite a lack
of knowledge on all sides - my knowledge often isn't deep enough,
GStreamer people don't know enough about state-of-the-art video
handling and its gains and pitfalls, Cairo and X people lack knowledge
about the requirements for video playback - and this lack of knowledge
often results in preconceptions that lead to wrong decisions and make
life harder on all sides.
It would be nice if there was a way to get you people together and
actually educate each other about this process. I'd suggest a
hackfest, but I'm not sure what others think and who to approach for
funding and locations.

There's also the issue of the maintainers' opinions about this code.
As there is quite a few projects involved (GStreamer, Cairo, pixman,
and possibly X) and I'm not really interested in spending a lot of
work on polishing code that ends up in some demo repository or gets
rejected.
It'd also be nice to get a review rather sooner than later so I can
fix design issues that are in need of updating while not having so
much code depend on it. And of course, I like people reviewing and
complimenting me on my code. :)

Then there's gst-plugins-gl. The GL plugins and my work touch on some
of the same issues (most notably hardware acceleration) and I'd like
to make sure this code can work with their approach and make use
cairo-gl buffers internally. The best possible outcome from my point
of view would be if we could port the GL plugins to use gstcairo and
make gstcairo provide the required functionality. That way we'd get
rid of the need to put explicit upload and download elements and gain
the ability to do all the GL stuff.
Unfortunately I lack knowledge about GL, so it'd be nice if someone
else could look at that.

I think that's all for now,

Benjamin