[cairo] Embedding jpeg in pdf

Sun Jul 13 00:40:25 PDT 2008

On Jul 12, 2008, at 9:23 PM, Adrian Johnson wrote:

> One of the items on the ROADMAP [1] for 1.8 is jpeg embedding in the
> PDF, PS, and win32-printing backends.

Awesome, I'm hoping that we can do Save-as-PDF natively in Firefox for  
the next release and this will be a big help in keeping the PDF size  
sane.

> There are a couple of ways this could be implemented. The minimal
> approach would be to add a function to supply the jpeg data to a
> surface. The jpeg data would be used instead of the image data when
> embedding the surface image in backends that support jpeg. The  
> advantage
> of this approach is that no dependency on a jpeg library is required.
> The disadvantage is that the user would need to be careful to ensure
> that the surface has the same content as the jpeg image otherwise
> fallback images are going to be wrong.
>
> The other approach is to provide a  
> cairo_image_surface_create_from_jpeg
> function. The advantage is a simpler API for applications to use. The
> disadvantage is the libjpeg dependency.

These aren't really two approaches, no?  The second depends on the  
functionality from the first -- the create_from_jpeg part is an  
additional API (that I strongly dislike, see below..).

> I have created three patches to implement these two ideas on the jpeg
> branch of my git repository [2]. The first patch adds the API function
>
> void
> cairo_surface_set_jpeg_data (cairo_surface_t    *surface,
>                              unsigned char      *data,
>                              long                length)
>
> to supply the jpeg data to a surface.

This makes sense, though if you're already adding the ability to  
attach data to a surface, why not make this more generic?  That way we  
won't have to add another function if we add support for, say, TIFF or  
somesuch.  Maybe a function that takes a mime type along with data/ 
length?

> The second patch implements PDF embedding of jpeg images. The third
> patch adds the API function:
>
> cairo_surface_t *
> cairo_image_surface_create_from_jpeg (const char   *filename)

I dislike this part -- the png stuff is in cairo because of testing,  
but I don't think we need to do jpeg decoding in cairo and especially  
don't need to add a dependency on libjpeg.  If an application wants to  
embed jpeg images, it's probably already decoding jpeg images for its  
own purpose -- there's no reason for cairo to duplicate that  
functionality.  (For testing purposes, it's probably ok to add a  
dependancy on libjpeg for the test itself, though.)

> The current limitations of these patches that I am aware of are:
>
> - I am not sure who should own the data supplied by
> cairo_surface_set_jpeg_data() and whether the entire data or only the
> pointer should be copied when snapshotting surfaces.

I'd say that the data should be shared until it is set again.. that  
is, I'd create a new internal object that's a refcounted data chunk  
which would be shared amongst all snapshots.  If set_data is called on  
a surface that has an existing shared object, it just decrements the  
refcount of that and creates a new one for itself.  That also seems to  
solve the data ownership problem -- if the PDF surface needs to hold  
on to the data for potentially longer than the surface lifetime, it  
can do so.   Though it might be nice for an application to be able to  
pass in a pointer to its own buffer and avoid a copy controlled by  
cairo as well, though that complicates the lifetime considerably.

     - Vlad