[cairo-bugs] [Bug 19487] Not all the relevant documents are followed when make a ToUnicode map for PDF

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Jan 9 23:25:39 PST 2009


--- Comment #1 from Adrian Johnson <ajohnson at redneon.com>  2009-01-09 23:25:37 PST ---
It was not clear to me in your bug report whether you are asking for the PDF
backend to:

a) embed the Adobe glyph names in the PDF file to facilitate text searching, or

b) parse the Adobe glyph names in the fonts to obtain the unicode to glyph
mapping and use this information to create the ToUnicode map.

In the case of a), we don't use glyph names in the PDF file. The ToUnicode maps
each glyph to one or more unicode characters using the hexadecimal numbers for
the glyph index and unicode character(s).

In the case of b), the cairo_show_text_glyphs() API is the preferred means of
providing the unicode to glyph mapping to the PDF backend. This information is
used to generate the ToUnicode entries for the one glyph to one or more unicode
character mappings while ActualText is used for the many to many mappings.

As a fallback for the case where cairo_show_glyphs() is used, cairo does a
reverse lookup of the cmap in the font. However this only works for 1 to 1
mappings. I don't think extending the current fallback method to parsing the
Adobe glyph names, including a complete list of glyph names in cairo, and
dealing with the various non Adobe Glyph Naming convention compliant names in
order to extract the n to 1 ligatures would be worth the effort required.

I do want cairo to be able to easily generate searchable PDF files. If you can
provide any more relevant information such as what application you are using,
what PDF viewer you are using and a sample PDF file we can try to help you
resolve this issue.

Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the cairo-bugs mailing list