[cairo] [PATCH 3/3] [test] Use UTF-8 in test files
Bill Spitzak
spitzak at gmail.com
Wed Mar 11 11:28:36 PDT 2015
On 03/10/2015 08:02 PM, Lawrence D'Oliveiro wrote:
> On Tue, 10 Mar 2015 13:03:08 -0700, Bill Spitzak wrote:
>
>> The main culprit are idiots who think you have to "translate to
>> Unicode" immediately on input. That is a byte stream and should
>> remain a byte stream.
>
> What about line terminators? Many of these text-manipulation tools
> treat their input as divided up into lines. Does that not go against
> the concept of a “byte stream”?
>
> Especially when you get into the specifics of what constitutes a line
> terminator...
Programs should only use ASCII characters as line terminators. Using
"NEL" and some Unicode characters will make your code incompatible with
many other pieces of software. Several systems (YAML and JSON is a good
example) have reverted attempts to read non-ASCII line terminators
because it broke many other pieces of software, all of which had
perfectly clear understanding of the encoding being used.
The characters can be found by matching patterns of bytes if you really
want to find them. Other than NEL they cannot occur in non-Unicode so
matching the UTF-8 would work. Matching one-byte NEL is very much
recommended against as it was used as a printing character in CP-1252
which is often confused with ISO-8859-1.
More information about the cairo
mailing list