[cairo] Re: Embeding JPG in PDF

Tue Jan 9 17:17:28 PST 2007

On 1/9/07, Bill Spitzak <spitzak at d2.com> wrote:

> To avoid having cairo link every image library known the solution seems
> to be "register" a constructor function. This involves calling some
> cairo api with a pointer to a function that takes a filename (used for
> pattern-matching filenames, it is not opened), a block of data that is
> the first 512 or so bytes of the file, and whatever arguments are passed
> to the base class constructor such as the read+close callbacks.
> function figures out from the block of data (or the filename if it is
> really stupid) if it can read the data, and if so constructs the proper
> subclass and returns it. It returns null if the test fails. It helps a
> lot if platform-specific tricks are done so that, for instance, the jpeg
> subclass can be in the cairo library, but if the register for it is not
> done it does not matter if libjpeg exists on the machine, the program
> will work. Add a "register_common_images" function that gets you jpeg
> and png.
>
> A huge annoyance is that for most image file libraries it is quite
> inefficient to measure the image without reading it at the same time, if
> you assume there is any chance that after measuring the image the caller
> will want the decompressed data. The result has usually been that it is
> impossible to avoid the decompression when the object is constructed, or
> at least when it is first used as a source. So the pdf example will
> likely result in the image being decompressed into memory even if that
> data is not used. Some file types can get the width/height from the
> block of data used to id the file so they don't have this problem.

I'm trying to provide a patch to do something very close to what you
propose here. As I described in my previous post, to embed an image
into a PDF document, you need to know a couple of details about the
image file which are all present in the file header (for PNG and JPEG
and most other image formats).

My patch will do:
1. Detect the image type (max 8 bytes read)
2. Fetch the image information (dimension, resolution, bpp, color
scheme, channels..)
3. Fetch the full image file into a buffer (the complete file, no
decompression or processing here)
4. Instert a XObject/image

I'm done with 1,2 and 3. I will try to finish 4. as soon as possible.
It may help to take a decision.

As Stuart pointed out, I also like to keep cairo doing what he knows,
rendering. Image loading/saving/decoding is not required to bundled
images in PDF.

--Pierre