[cairo] Improving PDF output

Tue Jan 9 12:36:17 PST 2007

On Tue, 2007-01-09 at 14:47 -0500, Owen Taylor wrote:
> On Tue, 2007-01-09 at 13:38 -0500, Behdad Esfahbod wrote:
> >   - Pango apparently has all this information readily available and can
> > make such a call very easily.
> 
> As long as people aren't using pango_show_glyph_string() directly...
> normally, certainly yes.

Ouch!  gdkpango uses that.  Ok, We can extend PangoRenderer to have a
draw_glyph_item kind of API.

> >   - Make Pango go over the glyphstring for right-to-left runs from end
> > to start (that is, in logical order), and for right-to-left lines from
> > end run to start run, or in the logical order of the runs.
> 
> I think you always want to emit glyphs in visual order and let the 
> backend figure out how to best encode that into a document for both
> compactness (not positioning every glyph) and correctly representing
> the text. Having glyph order not correspond to the X advance of the
> font is going to be awkward.

Agreed.  What about items in the line?  Do you think doing them in the
logical order makes sense?  I have no idea how Poppler/Evince work here.

> >   - The PDF backend will, for each grapheme, use reverse cmap lookup to
> > get a text string associated with the glyphs.  If this string is
> > identical to the one provided, it will be used directly (like Alp's
> > patch, or lack thereof, does), otherwise, it will start a new text
> > operation for this grapheme and use ActualText around it.
> 
> Hmm, it might make sense to make that determination for the whole 
> string at once, to avoid an encoding for Hindi (say) where you are
> constantly switching between the two representations? Since there
> always are going to be some grapheme/characters that can be mapped
> by the cmap.
> 
> Though I suppose you need to break up the ActualText markings to the
> grapheme (more properly, cluster ... a 'ff' ligature is two graphemes
> but one cluster) level to allow for proper selection boundaries.

Yes, since ActualText needs to be done per cluster, there's not much
grouping to be done, except for grouping non-ActualText clusters
together, which we will certainly do.

> >   - Poppler/Evince just need to do the logical mapping between the glyph
> > boundaries of the grapheme and the ActualText characters provided.  That
> > is, to break the width into the number of characters, etc.  Some glib
> > Unicode calls can help with which characters are cursor positions and
> > which are not.  Or rather, pango calls.
> > 
> > I just wonder if the cairo API needs to know about right-to-left
> > glyphstrings.  Is there anything that can be encoded in the PDF?
> 
> Yes, there is a ReversedChars annotation that indicates that the
> characters within the enclosed operator are in reverse of logical order.
> The combination of that plus a ToUnicode map would probably allow
> doing most Arabic without the use of ActualText. (Note the restriction
> of ReversedChars to single words without embedded spaces, so you still
> have to break up text to the word level.)

Ok, sounds promising.  Thanks for the info.

> 					 - Owen

-- 
behdad
http://behdad.org/

"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759