[cairo] Improving PDF output
Behdad Esfahbod
behdad at behdad.org
Tue Jan 9 12:36:17 PST 2007
On Tue, 2007-01-09 at 14:47 -0500, Owen Taylor wrote:
> On Tue, 2007-01-09 at 13:38 -0500, Behdad Esfahbod wrote:
> > - Pango apparently has all this information readily available and can
> > make such a call very easily.
>
> As long as people aren't using pango_show_glyph_string() directly...
> normally, certainly yes.
Ouch! gdkpango uses that. Ok, We can extend PangoRenderer to have a
draw_glyph_item kind of API.
> > - Make Pango go over the glyphstring for right-to-left runs from end
> > to start (that is, in logical order), and for right-to-left lines from
> > end run to start run, or in the logical order of the runs.
>
> I think you always want to emit glyphs in visual order and let the
> backend figure out how to best encode that into a document for both
> compactness (not positioning every glyph) and correctly representing
> the text. Having glyph order not correspond to the X advance of the
> font is going to be awkward.
Agreed. What about items in the line? Do you think doing them in the
logical order makes sense? I have no idea how Poppler/Evince work here.
> > - The PDF backend will, for each grapheme, use reverse cmap lookup to
> > get a text string associated with the glyphs. If this string is
> > identical to the one provided, it will be used directly (like Alp's
> > patch, or lack thereof, does), otherwise, it will start a new text
> > operation for this grapheme and use ActualText around it.
>
> Hmm, it might make sense to make that determination for the whole
> string at once, to avoid an encoding for Hindi (say) where you are
> constantly switching between the two representations? Since there
> always are going to be some grapheme/characters that can be mapped
> by the cmap.
>
> Though I suppose you need to break up the ActualText markings to the
> grapheme (more properly, cluster ... a 'ff' ligature is two graphemes
> but one cluster) level to allow for proper selection boundaries.
Yes, since ActualText needs to be done per cluster, there's not much
grouping to be done, except for grouping non-ActualText clusters
together, which we will certainly do.
> > - Poppler/Evince just need to do the logical mapping between the glyph
> > boundaries of the grapheme and the ActualText characters provided. That
> > is, to break the width into the number of characters, etc. Some glib
> > Unicode calls can help with which characters are cursor positions and
> > which are not. Or rather, pango calls.
> >
> > I just wonder if the cairo API needs to know about right-to-left
> > glyphstrings. Is there anything that can be encoded in the PDF?
>
> Yes, there is a ReversedChars annotation that indicates that the
> characters within the enclosed operator are in reverse of logical order.
> The combination of that plus a ToUnicode map would probably allow
> doing most Arabic without the use of ActualText. (Note the restriction
> of ReversedChars to single words without embedded spaces, so you still
> have to break up text to the word level.)
Ok, sounds promising. Thanks for the info.
> - Owen
--
behdad
http://behdad.org/
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the cairo
mailing list