[cairo] PDF Text Extraction: Past and Present
Eugeniy Meshcheryakov
eugen at debian.org
Fri Feb 2 14:28:59 PST 2007
2 лютого 2007 о 15:32 -0500 Behdad Esfahbod написав(-ла):
> As I mentioned already, the only standard way to allow text
> extraction with custom fonts is to add ToUnicode mappings to
> embedded fonts.
It is also posible to use ActualText entry, but cmaps are better.
> To summarize, I suggest that we generate ToUnicode mappings for
> all fonts embedded in cairo's PDF output. This should be done by
> calling into the font backends, passing in the scaled-font and an
> array of glyph indices, and get back an array of Unicode
> character codes. It helps the backend if input glyphs are sorted
> numerically. The PDF backend then will build and add the
> ToUnicode CMap.
While this will work for simple writing systems, I think that it will
not be very useful for complex scripts, where unencoded glyphs (or
glyphs in PUA) will be used most of the time.
--
Eugeniy Meshcheryakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.freedesktop.org/archives/cairo/attachments/20070202/bb14fb56/attachment.pgp
More information about the cairo
mailing list