[cairo] Size of PDF when splitting a PDF surface

Thomas Petazzoni thomas.petazzoni at enix.org
Wed Aug 18 01:29:52 PDT 2010


I am one of the developer of MapOSMatic (http://www.maposmatic.org), a
Web service that generates printable maps and street indexes using
OpenStreetMap data. We heavily use the Cairo backend of Mapnik to
generate our PDF, SVG and PNG maps, and we also use Cairo to draw and
render the street index.

In the new version of MapOSMatic we're developing, we are implementing
a "booklet" rendering mode: instead of having the city map rendered on
a single, large PDF file that is hard to print on common printers, we
will split the map on different (A5, A4, etc.) pages. To do this, we
ask Mapnik to render the full map in a single large Cairo PDFSurface,
and then create another Cairo PDFSurface of the destination size (A5,
A4, etc.), in which we use Context.set_source_surface(),
Context.rectangle() and Context.fill() to render on each page a part of
the original Cairo PDFSurface.

It works well. However, we are facing a size problem in the resulting
PDF file. If the original large map takes, say, 3 MB, and we split it
on 16 pages, the final PDF takes 3 * 16 = 48 MB. If the original large
map takes 5 MB, and we split it on 48 pages, the final PDF takes 240
MB. It appears that the full contents of the original PDF gets
replicated on every page of the resulting PDF, even though each page
only displays a small part of the original PDF.

To highlight this problem, I've created a simple test case in which
I've replaced the complicated Mapnik rendering by some simple Cairo
drawings. The test case (attached, using Python Cairo) does the
following things :

 * creates a 8*72 x 8*72 PDF surface in which we draw some stuff,
   render it inside a PDF and then display its size

 * creates the same 8*72 x 8*72 PDF surface with the same contents, and
   then a 2*72 x 2*72 PDF surface in which on each page, we render
   1/16th of the original surface. We then display the size of this
   final PDF file. It happens that this file is always about 16 times
   bigger than the original PDF.

A "complexity" argument allows to complexify the initial drawing, which
increases the size of the original PDF surface. Whatever the
"complexity" is, the final PDF will always be about 16 times bigger
than the original PDF surface. Some numbers :

 Complexity 20, original PDF  65209 bytes, final PDF 1050035 bytes
 Complexity 30, original PDF  97072 bytes, final PDF 1559843 bytes
 Complexity 50, original PDF 160927 bytes, final PDF 2581507 bytes

Of course, those final PDF sizes are acceptable, but in our real
application (MapOSMatic), we get PDF up to 200 MB.

Is our way of extracting parts of a surface into another surface
incorrect ? Is there a way of making sure that Cairo includes only once
in the final PDF the contents of the original surface ?

Thanks for your feedback !

Thomas Petazzoni                         http://thomas.enix.org
MapOSMatic				 http://www.maposmatic.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cairo-pdf-size-problem.py
Type: text/x-python
Size: 2231 bytes
Desc: not available
URL: <http://lists.cairographics.org/archives/cairo/attachments/20100818/73d1e4d1/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.cairographics.org/archives/cairo/attachments/20100818/73d1e4d1/attachment.pgp>

More information about the cairo mailing list