[cairo] Lots of text API pushed
Adrian Johnson
ajohnson at redneon.com
Fri Aug 8 07:35:45 PDT 2008
Behdad Esfahbod wrote:
> Things that need a fix before releasing 1.7.2 (hopefully tomorrow):
>
> - test/user-font text is coming out with the wrong color in PDF. Only
> happens with this test. So, should be some bug with Type3 fonts.
This was recently fixed in poppler. I am using the latest git poppler
and it is working for me.
> - test/user-font-proxy text is coming out as bitmap glyphs in PDF.
I made text in user-font glyph use the fallback path as currently cairo
can not add glyphs to subsets at the same time as the subsets are being
embedding in the PDF. Some refactoring is needed to make this possible.
> - Decide what to do with zero-glyph clusters. PDF tries to generate
> ActualText for them, but that can't really work without something inside
> the ActualText. We can disallow zero-glyph clusters and then I will
> handle them in Pango using a user-font with no drawings.
I thought I had zero-glyph clusters working but on further investigation
it appears it only works for extracting text with pdftotext (which I did
most of my testing with). Copying and pasting the text from evince or
acroread ignores the ActualText spans with no glyphs.
I have a sample PDF file created by Adobe InDesign that uses ActualText
to insert tabs and newlines in the extracted text to ensure tables are
correctly extracted. Looking at it more closely the ActualText for each
tab is inline with the content (like we do in cairo) and prints a space
glyph. The ActualText for the newline is part of a tagged text structure
and has no glyphs.
So it looks like acroread only supports zero-glyph ActualText entries
that are part of the tagged text structure. With some work the PDF
backend could be changed to use tagged text for zero-glyph ActualText
clusters. There are benefits to supporting tagged text in cairo and
poppler such as allowing text reflow in PDF viewers. However I do not
have any plans to do this work for in time for 1.8.0.
Could Pango print a space glyph in each zero-glyph cluster and adjust
the position of the next glyph? This would use a lot less space in the
PDF file than changing the font twice and would potentially be more
efficient for viewers as well.
More information about the cairo
mailing list