[cairo] Lots of text API pushed

Fri Aug 8 18:15:46 PDT 2008

Behdad Esfahbod wrote:
>> Could Pango print a space glyph in each zero-glyph cluster and adjust 
>> the position of the next glyph? This would use a lot less space in the 
>> PDF file than changing the font twice and would potentially be more 
>> efficient for viewers as well.
> 
> I'll document that zero-glyph clusters don't work great then.
> 
> Humm, space doesn't work as the zero-glyph clusters have varying width.
> We need a new glyph for each width.  Right?

I am not understanding what the issue is. cairo_show_text_glyphs() 
specifies the position of each glyph so you can set the position of the 
next glyph after the zero-glyph cluster to make the zero-glyph cluster 
whatever width you want.

For example if we are displaying the glyphs "abde" but want the text 
extracted to be "abcde", using a zero-glyph cluster to insert the "c" in 
the extracted text, the pdf would be:

(ab) Tj /Span << /ActualText (c) >> BDC EMC (de) Tj

If we instead use a cluster with one space glyph that maps to the "c" 
and adjust the position of the "d" glyph so that the "abde" is displayed 
correctly the pdf would be:

(ab) Tj /Span << /ActualText (c) >> BDC ( ) Tj EMC [250(de)] TJ

I tested this and it works perfectly in acroread. Poppler does not 
extract this correctly (it drops the "c") but Poppler bugs can be fixed. 
This is probably the same bug Poppler has with accented characters 
created from two glyphs [1].

> Also, does this commit look right to you:
> 
> 
> http://cgit.freedesktop.org/cairo/commit/?id=38c5f0d49b2ce1a6146cbea5ec3376a52cac8e68

The second part that fixes the "subset_glyph->utf8_is_mapped = ..." is 
correct and fixes the problem where ActualText was being used for 
everything.

The first part that only calls _cairo_sub_font_glyph_lookup_unicode() if 
utf8_len < 0 does not look right.

What I intended the code to do is to always use the index_to_ucs4 for 
toUnicode if it is available. This is to ensure the scenario you 
describe in the commit message does not occur.

[1] http://lists.freedesktop.org/archives/poppler/2008-June/003877.html