[cairo] PDF Text Extraction: Future

Robert O'Callahan robert at ocallahan.org
Mon Oct 22 15:38:06 PDT 2007


On Oct 23, 2007 11:21 AM, Behdad Esfahbod <behdad at behdad.org> wrote:

> >            There is nothing preventing a library generating
> >            glyphs that have a negative advance width and so go in the
> >            logical order for right-to-left text, but it's not common
> >            practice and most probably not very well supported.
> >
> > If I understand you correctly, Gecko does this. For RTL runs we're
> > calling cairo_show_glyphs with a glyph array whose x-offsets decrease
> > along the array.
>
> You may want to revisit this.  It adds lots of overhead both in X and
> PS/PDF backends as each glyph need to be positioned individually.
>

I'll keep that in mind, thanks.

> I think this is technically necessary for CSS compliance since CSS
> > says that all other things being equal, content later in a document
> > ( i.e. in logical order) is higher in z-order than content earlier in
> > the document.
>
> Humm, not sure if it's necessary.  Basically the order of glyphs in a
> single show_glyph() call should be irrelevant to the output.  Any weird
> combinations of operators and sources that violate that assumption?


You may be right. But possibly with (future) user fonts where glyphs can
have different colours? Sounds like a fragile assumption in general when you
consider all possible backends etc.

Rob
-- 
"Two men owed money to a certain moneylender. One owed him five hundred
denarii, and the other fifty. Neither of them had the money to pay him back,
so he canceled the debts of both. Now which of them will love him more?"
Simon replied, "I suppose the one who had the bigger debt canceled." "You
have judged correctly," Jesus said. [Luke 7:41-43]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cairographics.org/archives/cairo/attachments/20071023/407cf487/attachment.htm 


More information about the cairo mailing list