[cairo] PDF Text Extraction: Past and Present

Baz brian.ewins at gmail.com
Sat Feb 3 13:07:25 PST 2007

On 03/02/07, Behdad Esfahbod <behdad at behdad.org> wrote:
> On Fri, 2007-02-02 at 21:43 +0000, Baz wrote:
> > BTW one thing missing from your
> > excellent summary was the zapf table:
> Yeah, I didn't mention Zapt tables because there's no mention of them in
> the PDF reference (as far as I found).  So they are yet another
> non-standard way to text extraction from PDF.  They are kinda parallel
> to the ToUnicode mechanism.

That wasn't quite what I meant. I don't mean that we could generate
zapf tables in subsetted fonts for pdf to use, but that this is where
I'd look in the original font for the glyph->codepoint mappings for
_cairo_truetype_map_glyphs_to_unicode (instead of reversing cmap). Its
pretty irrelevant though, since zapf seems to be unused in the wild,
Adrian's approach is the right one.


More information about the cairo mailing list