[cairo] Size of PDF with lots of images
Simon Sapin
simon.sapin at exyr.org
Thu Jan 16 05:54:26 PST 2014
On 16/01/2014 03:44, Behdad Esfahbod wrote:
> Back in 2007 Carl and I developed Slippy a GUADEC to do our cairo slides. I
> have used it since for many presentations. It's a pycairo-based tool where
> you express slides as Python functions. It's very handy, specially if you
> want to use cairo drawing in your slides.
>
> Back in the days, if I had a huge background image, it was replicated in each
> slide, so I was getting, like, 240MB PDFs for a simple presentation.
> Fortunately that has long been fixed.
>
> Now, for my GLyphy talk [2], the source images are 14MB [3], but the generated
> PDF [4] is 18MB. Does anyone feel like taking a look?
>
> [1] http://github.com/behdad/slippy
> [2] https://vimeo.com/83732058
> [3] https://github.com/behdad/slippy/tree/master/glyphy
> [4] http://behdad.org/glyphy_slides.pdf
Hi Behdad,
Cairo’s default way of storing raster images in PDF is raw pixel data
compressed as deflate with zlib’s default compression level [1].
Even though PNG also uses deflate, PDF’s encoding is not PNG so the
images are decompressed and re-compressed. I’m not too surprised to see
the size increase. You could try a build of cairo that uses zlib’s
maximum compression level and see what happens. Of course, this is a
compromise with compression speed. Maybe it’s worth adding API to change
this level.
If your images were in a format that the PDF backend supports [2] (which
includes JPEG but not PNG), you could use cairo_surface_set_mime_data()
to have cairo store the original image data (almost) as-is in PDF,
without re-compressing. Although I expect that lossy JPEG may not look
nice for these specific images.
pycairo does not support Surface.set_mime_data(), but cairocffi does
[3]. It also includes some glue code to load images (including JPEG)
into an ImageSurface, using GDK-PixBuf [4].
[1]
http://cgit.freedesktop.org/cairo/tree/src/cairo-deflate-stream.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n143
[2]
http://cgit.freedesktop.org/cairo/tree/src/cairo-pdf-surface.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n179
[3] http://pythonhosted.org/cairocffi/
[4] http://pythonhosted.org/cairocffi/pixbuf.html
Cheers,
--
Simon Sapin
More information about the cairo
mailing list