[cairo] Serious concerns about cairo

Mon Sep 25 14:38:52 PDT 2006

On 9/25/06, Bill Spitzak <spitzak at d2.com> wrote:
>
>
> Mike Emmel wrote:
>
> > Add drawing operations at least for ucs2 chars. No one doing advanced text
> > layout works with utf8 you can't internally.
>
> I BEG TO DIFFER!
>
> Please don't use "wide characters" in the api. The sooner we stamp this
> abonimation out, the better.
>
> It really is not that hard to use UTF-8 in buffers. Finding the next
> glyph means looking for the next byte that does not start with 10.
> Certainly easier than finding the next word. And I really doubt your
> word processer does not have a find-the-next word function.
>
> And think: do you *really* need the ability to jump to character N
> without first examining the N-1 characters before that. REALLY? I think
> you may be mistaken. Remember that you can store the offset IN BYTES
> from the start of any buffer in exactly the same amount of space as it
> takes to store the offset in "glyphs".
>
> Another thing is that you seem to be ignoring "surrogate pairs". These
> basically mean that utf-16 is as hard or worse than utf-8 to handle,
> except the unusual cases are much more rare and thus the code will not
> get tested well.
>
> And don't forget "combining characters". This means that if youre
> "advanced text layout" is thinking about glyphs at all, then you are
> screwing up.
>
> Please don't dismiss UTF-8. We will all be much better off the sooner
> everybody switches to it.
>
>

Interesting who uses utf8 internally ?  Pango expands to glyphs.

Actually one of my big beefs with most advanced layout engines is when
I know I don't have any of the advanced problems they solve they don't
generally offer a fast path.  The nice thing about utf16 is if you
don't have surrogates then its just a fast ucs2 rendering case.
Its more a matter of having a system that can both provide very fast rendering
for simple ascii and 16 bit encoding and the complex paths for complex cases.
Right now you tend to be stuck with layout engines that assume the
worst case for all input when a simple scan can determine which path
to take.

With that said I like your idea of utf8 since most of the time the
incoming text is in utf8 and I can set a complexity flag and route
through the complex engine if I hit a char that is to large. In fact
since I generally assume I can decode to  ucs2 buffer and its large
enough I could even switch as you say to using utf8 for this case
either by re-encoding  the current buffer or moving to a new one. Also
of course I have other info on the incoming data that would allow me
to pick the decoding route. Your probably right that utf8  is as easy
as anything else to deal with if the charset is complex. Since the
mapping is screwed up anyway.

I'll have to think about what your saying about just leaving it in
utf8 in general.
Its not a bad idea at all and seems sensible in the complex case at the minimum.
Its not hard to set up runs of text either in ucs2 or utf8 intermixed
with shaped text
and its easy to set some thresholds to determine if you should shape all of it.
Which of course leads to what your saying that you should just as well
leave it in utf8 and annotate with the shape information for the
simple runs I do the glyph lookup on the fly and if its utf8 I'm not
wasting bytes for ascii.

Thanks I'll certinaly think about it.
What do you think about what I'm proposing ?

Mike