[cairo] PDF Text Extraction: Future
otaylor at redhat.com
Tue Oct 23 06:33:38 PDT 2007
On Tue, 2007-10-23 at 11:38 +1300, Robert O'Callahan wrote:
> On Oct 23, 2007 11:21 AM, Behdad Esfahbod <behdad at behdad.org> wrote:
> > There is nothing preventing a library generating
> > glyphs that have a negative advance width and so go in the
> > logical order for right-to-left text, but it's not common
> > practice and most probably not very well supported.
> > If I understand you correctly, Gecko does this. For RTL runs we're
> > calling cairo_show_glyphs with a glyph array whose x-offsets decrease
> > along the array.
> You may want to revisit this. It adds lots of overhead both
> in X and PS/PDF backends as each glyph need to be positioned
> I'll keep that in mind, thanks.
> > I think this is technically necessary for CSS compliance since CSS
> > says that all other things being equal, content later in a document
> > ( i.e. in logical order) is higher in z-order than content earlier in
> > the document.
> Humm, not sure if it's necessary. Basically the order of glyphs in a
> single show_glyph() call should be irrelevant to the output. Any weird
> combinations of operators and sources that violate that assumption?
> You may be right. But possibly with (future) user fonts where glyphs
> can have different colours? Sounds like a fragile assumption in
> general when you consider all possible backends etc.
To point out something that may be entirely obvious ... the order invariance of
glyphs in cairo isn't coincidental, it's part of the model. That is, if you
make a single show_glyphs() call with overlapping glyphs and alpha of 0.5,
you should *expect* that the resulting shape has an alpha of 0.5 in the
There may be backends that don't properly support that, but those backends should
be considered mildly buggy.
And similarly, it's part of the cairo model that when you call show_glyphs()
you get an shape filled with the current source ... a backend can't just start
using colored glyphs because it feels like it.
So, up until new API gets added to cairo and Firefox starts using it, it's
safe and recommended to order glyphs in visual order.
More information about the cairo