[cairo] [PATCH] script: More compatible with C89 and VC++.
Bill Spitzak
spitzak at gmail.com
Sat Jun 26 16:12:59 PDT 2010
On 06/26/2010 01:43 AM, Tor Lillqvist wrote:
>> Isn't there a way to force the "system code page" to be UTF-8?
>
> Unfortunately, no. That would be great, but I assume it would break a
> lot of existing code which has been able to assume that the system
> codepage is either a single-byte one like codepage 1252 (i.e.
> ISO-8859-1 basically), or a double-byte one like codepage 936. Such
> code would break if the system codepage would be a multi-byte one.
I thought it was like locale() and only effect the current program?
Though apparently I can never overestimate Microsoft's stupidity...
> Microsoft's policy, as far as I can deduce, is that one should just
> use Unicode and forget code pages.
Exactly what I want to do! I think you are using the Microsoft meaning
of "Unicode" which is really "UTF-16". I want UTF-8 as that is what
everybody writes in files and sends over communication links.
> Now, how widely used is this basename() in fact in cairo then? I don't
> remember ever coming across it when building cairo on Windows. Is it
> used only by some helper program that is not necessarily needed when
> building cairo?
I'm thinking it is only used for the demos, perhaps the .png output,
which is not an official part of Cairo. In this case I would just
substitute a simple byte-based one that looks for / and \ and :.
>> Also quite a few cairo API's assume the text
>> is UTF-8 so it is silly to pretend that other encodings can exist.
>
> For text (not filenames), assuming UTF-8 is definitely the right thing, IMHO.
It has to be assumed for filenames as well. One of the things you do
with filenames is display them to the user!
My recommendation is to ALWAYS assume UTF-8, and then if you see UTF-8
encoding errors you can then interpret it as a legacy encoding. Due to
UTF-8's rather inefficient design only a tiny fraction of possible byte
sequences are valid, meaning that text in legacy encodings are always
recognizable. In my case I assume all errors are CP-1252 (even on
Linux), but I am pretty certain this would work for the Japanese 2-byte
encodings.
> But in the cairo test code case, basename() will be called on
> filenames originating from the command line, no?
I think reading filenames from another text file may not be uncommon!
> I think, for the case of cairo performance test programs, we should
> just explicitly verify that the command line arguments are ASCII
> only...
No, what you should do is NOTHING, just take the damn bytes and hand
them to the system. For some reason UTF turns otherwise intelligent
programmers into some kind of idiot savants, they will then do
INCREDIBLE amounts of bogus work because they somehow feel that an
encoding error will cause a crash. In reality an encoding error is about
as bad as a misspelled word and should be treated as such, it would be
INSANE to put a spelling corrector into every string function, but the
equivalent insanity is always being suggested for UTF-8.
More information about the cairo
mailing list