[cairo] Lots of text API pushed

Adrian Johnson ajohnson at redneon.com
Fri Aug 8 07:35:45 PDT 2008

Behdad Esfahbod wrote:
> Things that need a fix before releasing 1.7.2 (hopefully tomorrow):
>   - test/user-font text is coming out with the wrong color in PDF.  Only
> happens with this test.  So, should be some bug with Type3 fonts.

This was recently fixed in poppler. I am using the latest git poppler 
and it is working for me.

>   - test/user-font-proxy text is coming out as bitmap glyphs in PDF.

I made text in user-font glyph use the fallback path as currently cairo 
can not add glyphs to subsets at the same time as the subsets are being 
embedding in the PDF. Some refactoring is needed to make this possible.

>   - Decide what to do with zero-glyph clusters.  PDF tries to generate
> ActualText for them, but that can't really work without something inside
> the ActualText.  We can disallow zero-glyph clusters and then I will
> handle them in Pango using a user-font with no drawings.

I thought I had zero-glyph clusters working but on further investigation 
it appears it only works for extracting text with pdftotext (which I did 
most of my testing with). Copying and pasting the text from evince or 
acroread ignores the ActualText spans with no glyphs.

I have a sample PDF file created by Adobe InDesign that uses ActualText 
to insert tabs and newlines in the extracted text to ensure tables are 
correctly extracted. Looking at it more closely the ActualText for each 
tab is inline with the content (like we do in cairo) and prints a space 
glyph. The ActualText for the newline is part of a tagged text structure 
and has no glyphs.

So it looks like acroread only supports zero-glyph ActualText entries 
that are part of the tagged text structure. With some work the PDF 
backend could be changed to use tagged text for zero-glyph ActualText 
clusters. There are benefits to supporting tagged text in cairo and 
poppler such as allowing text reflow in PDF viewers. However I do not 
have any plans to do this work for in time for 1.8.0.

Could Pango print a space glyph in each zero-glyph cluster and adjust 
the position of the next glyph? This would use a lot less space in the 
PDF file than changing the font twice and would potentially be more 
efficient for viewers as well.

More information about the cairo mailing list