[cairo] Unicode error causing Cairo to crash.

Bill Spitzak spitzak at d2.com
Wed Apr 27 13:19:28 PDT 2005

Carl Worth wrote:

> How does mishandling ill-formed utf-8 help anybody convert existing
> APIs or encourage any internationalization? The only thing I can see
> it doing is discouraging people from fixing errors.

I think you are not predicting how programmers "fix" these errors.

The very first moment a text string disappears or is truncated or an 
exception is thrown because the program sent a misencoded UTF-8 string, 
the programmer is going to quit using that UTF-8 interface and will 
never use it again.

If they are willing to do extra work to try to encourage the use of 
UTF-8, they will have to transcribe their UTF-8 to UCS-32, using their 
own rules or any of the dozens of different libraries. The code I posted 
is what I use to transcribe all text for Cairo, and it should be obvious 
from the comments and #if statements that I came up with quite a few 
possible interpretations, and I'm sure other programmers could come up 
with dozens more. This means the same string (including legal ones) will 
be drawn differently by different programs, and also means there could 
be mistakes such as interpreting multibyte encodings for nul and slash.

They may stop using UTF-8 altogether and continue with whatever encoding 
they have, since it takes no more work to transcribe that to UCS-32 or 
legal UTF-8 than it takes to scan UTF-8 and replace errors.

Or if they mostly work with ASCII they may just strip the high bit to 
get the text to print since they don't want to waste time trying to fix 
it correctly, or switch to a char interface such as Xft provides.

All these problems discourage internationalization. Thus my changes will 
encourage it.

There is nothing wrong with providing a "test this string to see if it 
is legal UTF-8" function. But the drawing library is not where to do it.

More information about the cairo mailing list