[cairo] PDF memory usage?

Wed Jun 10 10:35:34 PDT 2009

On Wed, Jun 10, 2009 at 9:26 AM, Ian Britten<britten at caris.com> wrote:

> Is there any conceptual problem with something like flush_to_file()
> that writes the current contents to disk and frees them, but doesn't
> advance to a new page?  If it's theoretically possible, I might be
> interested in exploring that option...

In pdf, resources, like an image to be drawn, and the actual drawing
commands which place it on the page, occur separately. One could, I
suppose, write out an enormous image to disk while buffering the
vector part. The trouble is distinguishing this case from the one
where there are three hundred-odd 60k logos sprinkled through a
billion vector coordinates, where one wants to spool things in the
opposite order.

In either case, this is a pretty sophisticated optimization.

> One thing I forgot to ask about:
> I believe I recall seeing discussion here about somehow putting
> JPEG data into a PDF.  Would that approach be of any interest
> to me, I wonder?  Wouldn't writing out my image as a compressed
> JPEG be [much] smaller than the in-memory 32-bit pixels?'

It would help for some data, although jpeg is not lossless. You can
also give cairo pre-compressed png data, which is lossless, or jpeg
2000, which can offer slightly better compression than either. Note
that this is a very new feature; cairo_surface_get/set_mime_data()
aren't mentioned in the website's version of the api documentation.
But you could hand it an mmapped file, which should save considerable
physical memory. (I didn't verify that the PDF surface avoids copying
the entire compressed buffer in memory before it writes it out, but it
looks like it tries.) Since it still requires a monolithic buffer, It
won't help much with 32 bit limitations, beyond the factor of 2-10 you
would get from the compression.

>>> The typical approach for dealing with large output like this
>>> seems to be to try and chunk/tile the data.  However, with
>>> the target being PDF, I'm not sure if this is possible

You can chunk the data yourself, as you draw it with cairo. That will
make the resulting PDF easier for readers to handle, but won't
necessarily help with cairo's memory footprint when writing the page
out. It can also be difficult to avoid artefacts at the image seams.
Turning off interpolation helps, but can be a problem for uses other
than printing/viewing at the target resolution. I don't think cairo
exposes that switch. Putting the seams under existing grid lines also
helps.

BTW, 10 GB of (uncompressed) image data is about 60k pixels square. I
couldn't find any precision guidelines in the spec, but I suspect
you'd start having precision problems around 30k square in a lot of
implementations. Which is just to say the file size isn't the only
limitation with PDF.

Hope that helps,
 -r