[cairo] [PATCH] script: More compatible with C89 and VC++.

Fri Jun 25 14:45:48 PDT 2010

Isn't there a way to force the "system code page" to be UTF-8? This 
would avoid all these problems. It certainly seems silly that if they 
are storing UTF-16 on the file system, that we have to deal with any 
legacy byte encodings if we don't want to. Also quite a few cairo API's 
assume the text is UTF-8 so it is silly to pretend that other encodings 
can exist.

My main problem with conversion is that you do not expect string 
manipulation to throw errors. If there is a UTF-8 error in the string, I 
expect basename to return a string with the same error in it, and I can 
concatenate a slash and directory, and I will not learn about the error 
until I actually try to open the file, in which case the result should 
not be much different than any attempt to use a character in the 
filename that is not allowed by the file system.

Tor Lillqvist wrote:

> We are talking about code that is to be run on Windows only, and that
> gets plain char strings representing existing file names as input,
> aren't we? What makes you think these code pages wouldn't be in use
> any longer? Every C program on Windows that calls functions like
> open(), stat(), fopen() uses the same old so called "ANSI" code pages
> for the file names passed to such functions. There is no UTF-8
> involved at the C library level if that is what you are thinking of.

UTF-8 is involved if the system encoding is UTF-8.

>> I would do the one-character-at-a-time api instead of allocating a buffer.
>> This will allow you to skip errors in the encoding,
> 
> But where would such encoding errors come from?

 From the user typing garbage in, or from programming errors that 
produced invalid UTF-8, or from literally copying a filename from a Unix 
system, where invalid UTF-8 is allowed. Also Windows allows invalid 
UTF-16 which, though it can be translated to UTF-8, some UTF-8 decoders 
think it is an error.