[cairo] Size of PDF with lots of images

Adrian Johnson ajohnson at redneon.com
Thu Jan 16 11:57:41 PST 2014


On 17/01/14 00:24, Simon Sapin wrote:
> On 16/01/2014 03:44, Behdad Esfahbod wrote:
>> Back in 2007 Carl and I developed Slippy a GUADEC to do our cairo
>> slides.  I
>> have used it since for many presentations.  It's a pycairo-based tool
>> where
>> you express slides as Python functions.  It's very handy, specially if
>> you
>> want to use cairo drawing in your slides.
>>
>> Back in the days, if I had a huge background image, it was replicated
>> in each
>> slide, so I was getting, like, 240MB PDFs for a simple presentation.
>> Fortunately that has long been fixed.
>>
>> Now, for my GLyphy talk [2], the source images are 14MB [3], but the
>> generated
>> PDF [4] is 18MB.  Does anyone feel like taking a look?
>>
>> [1] http://github.com/behdad/slippy
>> [2] https://vimeo.com/83732058
>> [3] https://github.com/behdad/slippy/tree/master/glyphy
>> [4] http://behdad.org/glyphy_slides.pdf
> 
> 
> Hi Behdad,
> 
> Cairo’s default way of storing raster images in PDF is raw pixel data
> compressed as deflate with zlib’s default compression level [1].
> 
> Even though PNG also uses deflate, PDF’s encoding is not PNG so the
> images are decompressed and re-compressed. I’m not too surprised to see
> the size increase. You could try a build of cairo that uses zlib’s
> maximum compression level and see what happens. Of course, this is a
> compromise with compression speed. Maybe it’s worth adding API to change
> this level.
> 
> If your images were in a format that the PDF backend supports [2] (which
> includes JPEG but not PNG), you could use cairo_surface_set_mime_data()
> to have cairo store the original image data (almost) as-is in PDF,
> without re-compressing. Although I expect that lossy JPEG may not look
> nice for these specific images.

Yes, jpeg images are the most likely reason for the increase in size.

I'm not sure what you mean by "lossy JPEG may not look
nice for these specific images". The jpeg data is stored exactly as
provided to cairo_surface_set_mime_data() so there will be no loss of
image quality.


> 
> pycairo does not support Surface.set_mime_data(), but cairocffi does
> [3]. It also includes some glue code to load images (including JPEG)
> into an ImageSurface, using GDK-PixBuf [4].
> 
> 
> [1]
> http://cgit.freedesktop.org/cairo/tree/src/cairo-deflate-stream.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n143
> 
> 
> [2]
> http://cgit.freedesktop.org/cairo/tree/src/cairo-pdf-surface.c?id=b56b971141bf22ee3452b7f6f5e2dfd373b99e13#n179
> 
> 
> [3] http://pythonhosted.org/cairocffi/
> 
> [4] http://pythonhosted.org/cairocffi/pixbuf.html
> 
> Cheers,



More information about the cairo mailing list