[cairo] PDF Text Extraction: Past and Present

Fri Feb 2 22:05:07 PST 2007

Behdad Esfahbod wrote:
> Recently I spent some time reading relevant parts of the PDF
> Reference, version 1.6 ("the reference") to come up with plans

Which version of the PDF specification are we using? Cairo currently
puts 1.4 in the PDF header.

> To summarize, I suggest that we generate ToUnicode mappings for
> all fonts embedded in cairo's PDF output.  This should be done by
> calling into the font backends, passing in the scaled-font and an
> array of glyph indices, and get back an array of Unicode
> character codes.  It helps the backend if input glyphs are sorted
> numerically. The PDF backend then will build and add the
> ToUnicode CMap.
> 
> Anyone taking this?

I am working on this.