[cairo] Size of PDF with lots of images

Simon Sapin simon.sapin at exyr.org
Thu Jan 16 05:54:26 PST 2014


On 16/01/2014 03:44, Behdad Esfahbod wrote:
> Back in 2007 Carl and I developed Slippy a GUADEC to do our cairo slides.  I
> have used it since for many presentations.  It's a pycairo-based tool where
> you express slides as Python functions.  It's very handy, specially if you
> want to use cairo drawing in your slides.
>
> Back in the days, if I had a huge background image, it was replicated in each
> slide, so I was getting, like, 240MB PDFs for a simple presentation.
> Fortunately that has long been fixed.
>
> Now, for my GLyphy talk [2], the source images are 14MB [3], but the generated
> PDF [4] is 18MB.  Does anyone feel like taking a look?
>
> [1] http://github.com/behdad/slippy
> [2] https://vimeo.com/83732058
> [3] https://github.com/behdad/slippy/tree/master/glyphy
> [4] http://behdad.org/glyphy_slides.pdf


Hi Behdad,

Cairo’s default way of storing raster images in PDF is raw pixel data 
compressed as deflate with zlib’s default compression level [1].

Even though PNG also uses deflate, PDF’s encoding is not PNG so the 
images are decompressed and re-compressed. I’m not too surprised to see 
the size increase. You could try a build of cairo that uses zlib’s 
maximum compression level and see what happens. Of course, this is a 
compromise with compression speed. Maybe it’s worth adding API to change 
this level.

If your images were in a format that the PDF backend supports [2] (which 
includes JPEG but not PNG), you could use cairo_surface_set_mime_data() 
to have cairo store the original image data (almost) as-is in PDF, 
without re-compressing. Although I expect that lossy JPEG may not look 
nice for these specific images.

pycairo does not support Surface.set_mime_data(), but cairocffi does 
[3]. It also includes some glue code to load images (including JPEG) 
into an ImageSurface, using GDK-PixBuf [4].


[1] 
http://cgit.freedesktop.org/cairo/tree/src/cairo-deflate-stream.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n143

[2] 
http://cgit.freedesktop.org/cairo/tree/src/cairo-pdf-surface.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n179

[3] http://pythonhosted.org/cairocffi/

[4] http://pythonhosted.org/cairocffi/pixbuf.html

Cheers,
-- 
Simon Sapin


More information about the cairo mailing list