[cairo] Surface error not set when using cairo_show_text() with invalid utf8
Bill Spitzak
spitzak at gmail.com
Tue Nov 2 14:03:11 PDT 2010
Maarten Bosmans wrote:
> 2010/11/2 Bill Spitzak <spitzak at gmail.com>:
>> PLEASE do not make UTF-8 errors stop any output!
>>
>> A lot of deluded systems engineers think doing this will "force people to
>> use Unicode correctly". But it does not, in fact it does the exact opposite!
>
> The fact that people, upon misusing cairo api by feeding it non-UTF-8
> encoded data, do not resolve the problem properly, but resort to the
> kind of ugly hacks you mention below, can hardly be blamed on the
> "deluded system engineers" that made the supporting libraries.
This is EXACTLY the deluded impression.
If the programmer is forced to write code, when a simple and obvious
change to the API would mean they could write NO code, then I think any
"hacks" in that code *are* the engineer's fault, because a correct API
would mean the hacks would not exist. Trying to pass blame for this is a
disease that is pretty bad in Linux and open source, and I would like to
see it stop!
> Why wouldn't one use any of the existing validation/conversion routines?
> http://library.gnome.org/devel/glib/2.26/glib-Unicode-Manipulation.html
Yes a programmer can call this if they want to detect errors. That has
nothing to do with the problem. Maybe if we are really, really, really
lucky, the programmer might call a useful function that preserves UTF-8
(but there is no "convert this UTF-8 to the closest possible valid form"
call in that library so I am afraid it will not happen).
> Silently interpreting data that should be UTF-8 as some other encoding
> when errors are encountered does not sound like a good approach.
I am not interpreting it as another encoding. I am trying desperately to
prevent users from making hacks that do that. The suggestion I have is
to replace *bytes* with an alternative symbol. The remaining string will
continue to be interpreted as UTF-8. Most hacks done by users completely
change the encoding, and often they do so even if there are no errors!
> Better would be to provide some kind of conversion function that takes
> a collection of bytes and tries to interpret them as good as
> possible, always resulting in valid UTF-8.
I fail to see why forcing the programmer to allocate a buffer and do a
conversion before calling the print function, just to fix a case where
the print function throws an error rather than draw the obviously
desired result, is "better".
Let me make another suggestion: lets add a "set the cairo error if there
is an error in this UTF-8" function, and fix the drawing like I suggest.
Then the users who want the current behavior can do these two calls.
More information about the cairo
mailing list