[cairo] PDF Text Extraction: Past and Present
Adrian Johnson
ajohnson at redneon.com
Fri Feb 2 22:05:07 PST 2007
Behdad Esfahbod wrote:
> Recently I spent some time reading relevant parts of the PDF
> Reference, version 1.6 ("the reference") to come up with plans
Which version of the PDF specification are we using? Cairo currently
puts 1.4 in the PDF header.
> To summarize, I suggest that we generate ToUnicode mappings for
> all fonts embedded in cairo's PDF output. This should be done by
> calling into the font backends, passing in the scaled-font and an
> array of glyph indices, and get back an array of Unicode
> character codes. It helps the backend if input glyphs are sorted
> numerically. The PDF backend then will build and add the
> ToUnicode CMap.
>
> Anyone taking this?
I am working on this.
More information about the cairo
mailing list