[cairo] Lots of text API pushed

Behdad Esfahbod behdad at behdad.org
Fri Aug 8 09:04:50 PDT 2008

On Sat, 2008-08-09 at 00:05 +0930, Adrian Johnson wrote:
> Behdad Esfahbod wrote:
> > Things that need a fix before releasing 1.7.2 (hopefully tomorrow):
> > 
> >   - test/user-font text is coming out with the wrong color in PDF.  Only
> > happens with this test.  So, should be some bug with Type3 fonts.
> This was recently fixed in poppler. I am using the latest git poppler 
> and it is working for me.

Right.  Now I remember.

> >   - test/user-font-proxy text is coming out as bitmap glyphs in PDF.
> I made text in user-font glyph use the fallback path as currently cairo 
> can not add glyphs to subsets at the same time as the subsets are being 
> embedding in the PDF. Some refactoring is needed to make this possible.

Ah, I see.

> >   - Decide what to do with zero-glyph clusters.  PDF tries to generate
> > ActualText for them, but that can't really work without something inside
> > the ActualText.  We can disallow zero-glyph clusters and then I will
> > handle them in Pango using a user-font with no drawings.
> I thought I had zero-glyph clusters working but on further investigation 
> it appears it only works for extracting text with pdftotext (which I did 
> most of my testing with). Copying and pasting the text from evince or 
> acroread ignores the ActualText spans with no glyphs.
> I have a sample PDF file created by Adobe InDesign that uses ActualText 
> to insert tabs and newlines in the extracted text to ensure tables are 
> correctly extracted. Looking at it more closely the ActualText for each 
> tab is inline with the content (like we do in cairo) and prints a space 
> glyph. The ActualText for the newline is part of a tagged text structure 
> and has no glyphs.
> So it looks like acroread only supports zero-glyph ActualText entries 
> that are part of the tagged text structure. With some work the PDF 
> backend could be changed to use tagged text for zero-glyph ActualText 
> clusters. There are benefits to supporting tagged text in cairo and 
> poppler such as allowing text reflow in PDF viewers. However I do not 
> have any plans to do this work for in time for 1.8.0.
> Could Pango print a space glyph in each zero-glyph cluster and adjust 
> the position of the next glyph? This would use a lot less space in the 
> PDF file than changing the font twice and would potentially be more 
> efficient for viewers as well.

I'll document that zero-glyph clusters don't work great then.

Humm, space doesn't work as the zero-glyph clusters have varying width.
We need a new glyph for each width.  Right?

Also, does this commit look right to you:



"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759

More information about the cairo mailing list