[cairo] Surface error not set when using cairo_show_text() with invalid utf8

Bill Spitzak spitzak at gmail.com
Mon Nov 1 16:31:55 PDT 2010


PLEASE do not make UTF-8 errors stop any output!

A lot of deluded systems engineers think doing this will "force people 
to use Unicode correctly". But it does not, in fact it does the exact 
opposite!

When a programmer sees their output truncated because of a UTF-8 error, 
they will then find the fastest possible method to get ASCII text after 
that error to print correctly. They DO NOT CARE about the Unicode if 
they cannot see the important information after it and they will not 
devote even a millisecond of thought to it. Therefore the solutions are 
often seriously detrimental to Unicode. Solutions I have seen:

1. Mask every byte with 0x7f
2. Copy to another buffer but strip every byte with the high bit set.
3. Copy to another buffer and replace every byte with the high bit set 
with the hex version of the byte's value (this one at least is 
attempting to preserve the data).
4. Double UTF-8 encode the text (in effect making it ISO-8859-1)
5. If there is a wchar interface, don't use the official converter, but 
instead just alternate your bytes with null to "convert" it (in effect 
making it ISO-8859-1).

Delusions that UTF-8 shoudl cause errors are probably the biggest 
impediment to I18N. In many ways things are worse today than they were 
in 1990, as more software is becomming ASCII-only because of solutions 
such as above.

For a concrete suggestion: if you see a UTF-8 error, substitute a single 
Unicode value such as U+FFFD for the *first* byte, and then continue 
decoding starting at the next byte. The only functions that should 
report that there were "errors" are functions explicitly named things 
like "areThereErrorsInThisUTF8()". If the converter is for drawing only 
(ie the output is not sent to another API) then converting the byte as 
ISO-8859-1 or Windows CP1252 is probably better, as the output will be 
readable if the text was accidentally in these encodings.

cu wrote:
> As the subject says,  looks like cairo_show_text() does not set the
> surface error (to be queried later by cairo_surface_status()) when
> provided invalid utf8 input (doesn't matter what it is - just has to be
> something that can't be properly decoded). Surface will not allow any
> more operations (i.e. any subsequent drawing is discarded) but at the
> same time status is still reported as "success". This is tested with
> cairo 1.9.8.
> 
> Ideally I would prefer that surface would continue working after this
> error (since the error is in external data and seems to be caught early
> by utf8 validation function  _cairo_utf8_to_ucs4). But at a very least,
> surface that can no longer be drawn into should be properly marked as such.
> --
> cairo mailing list
> cairo at cairographics.org
> http://lists.cairographics.org/mailman/listinfo/cairo


More information about the cairo mailing list