[cairo-bugs] [Bug 19487] Not all the relevant documents are followed when make a ToUnicode map for PDF

Sat Jan 10 00:16:29 PST 2009

http://bugs.freedesktop.org/show_bug.cgi?id=19487

--- Comment #2 from Barry Schwartz <chemoelectric at chemoelectric.org>  2009-01-10 00:16:27 PST ---
I am asking for (b). The PDF reader is Adobe Reader 8 (or 9 if I run it under
Wine), and using OpenType Latin fonts. There is no specific application; I was
trying out Cairo to decide whether to use it myself; I make e-books and want
searches to work. Also I occasionally make fonts and do trouble myself to name
all the glyphs according to the rules at
http://www.adobe.com/devnet/opentype/archives/glyph.html and with Cairo all
that care goes to waste.

If the algorithm at the web page are followed, then regardless of all OpenType
substitutions, no matter how intricate, Reader, and every other application I
have used (okular, evince pdftotext, etc.), can search and extract text. It's
like magic.

Looking over http://www.adobe.com/devnet/acrobat/pdfs/5411.ToUnicode.pdf it
appears to me that this is, implicitly, a recommendation for ToUnicode maps of
CJK fonts.

See also bullet point one in the "Extraction of Text Content" section (sect.
5.9) of the PDF Reference. No ActualText is needed, at least for recent
Latin-Greek-Cyrillic OpenType fonts.

-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.