[cairo] [PATCH] script: More compatible with C89 and VC++.

Bill Spitzak spitzak at gmail.com
Sat Jun 26 16:12:59 PDT 2010


On 06/26/2010 01:43 AM, Tor Lillqvist wrote:
>> Isn't there a way to force the "system code page" to be UTF-8?
>
> Unfortunately, no. That would be great, but I assume it would break a
> lot of existing code which has been able to assume that the system
> codepage is either a single-byte one like codepage 1252 (i.e.
> ISO-8859-1 basically), or a double-byte one like codepage 936. Such
> code would break if the system codepage would be a multi-byte one.

I thought it was like locale() and only effect the current program? 
Though apparently I can never overestimate Microsoft's stupidity...

> Microsoft's policy, as far as I can deduce, is that one should just
> use Unicode and forget code pages.

Exactly what I want to do! I think you are using the Microsoft meaning 
of "Unicode" which is really "UTF-16". I want UTF-8 as that is what 
everybody writes in files and sends over communication links.

> Now, how widely used is this basename() in fact in cairo then? I don't
> remember ever coming across it when building cairo on Windows. Is it
> used only by some helper program that is not necessarily needed when
> building cairo?

I'm thinking it is only used for the demos, perhaps the .png output, 
which is not an official part of Cairo. In this case I would just 
substitute a simple byte-based one that looks for / and \ and :.

>> Also quite a few cairo API's assume the text
>> is UTF-8 so it is silly to pretend that other encodings can exist.
>
> For text (not filenames), assuming UTF-8 is definitely the right thing, IMHO.

It has to be assumed for filenames as well. One of the things you do 
with filenames is display them to the user!

My recommendation is to ALWAYS assume UTF-8, and then if you see UTF-8 
encoding errors you can then interpret it as a legacy encoding. Due to 
UTF-8's rather inefficient design only a tiny fraction of possible byte 
sequences are valid, meaning that text in legacy encodings are always 
recognizable. In my case I assume all errors are CP-1252 (even on 
Linux), but I am pretty certain this would work for the Japanese 2-byte 
encodings.

> But in the cairo test code case, basename() will be called on
> filenames originating from the command line, no?

I think reading filenames from another text file may not be uncommon!

> I think, for the case of cairo performance test programs, we should
> just explicitly verify that the command line arguments are ASCII
> only...

No, what you should do is NOTHING, just take the damn bytes and hand 
them to the system. For some reason UTF turns otherwise intelligent 
programmers into some kind of idiot savants, they will then do 
INCREDIBLE amounts of bogus work because they somehow feel that an 
encoding error will cause a crash. In reality an encoding error is about 
as bad as a misspelled word and should be treated as such, it would be 
INSANE to put a spelling corrector into every string function, but the 
equivalent insanity is always being suggested for UTF-8.


More information about the cairo mailing list