[cairo] [PATCH 3/3] [test] Use UTF-8 in test files
spitzak at gmail.com
Tue Mar 10 13:03:08 PDT 2015
On 03/10/2015 12:02 PM, Andrea Canciani wrote:
> To be fair, 'sed' only defaults to UTF-8 if the environment does not
> explicitly define the encoding.
Defaulting to UTF-8 is a good idea.
My complaint is that UTF-8 encoding should not cause any byte stream to
fail. All it should do is alter some rules of pattern matching (in
regexps it may change what '.' matches). A script that does nothing with
"characters" but, for instance, replaces one block of bytes with another
(s/foo/bar/g) should produce identical output byte streams no matter
what the encoding is set to and whether or not the byte streams "foo"
and "bar" contain valid UTF-8 encoding or not.
The current way a lot of tools are being written is a disaster, hurting
I18N by making it impossible to mix encodings and thus transition from
legacy ones to modern ones, and breaking lots of long-standing Unix
The main culprit are idiots who think you have to "translate to Unicode"
immediately on input. That is a byte stream and should remain a byte
stream. "translate to Unicode" is a job of DISPLAY, not interpretation
or manipulation. And even the display should not barf on bad UTF-8, just
draw some error blocks for the bad bytes.
It's also annoying that the correct way to write these tools would be
vastly simpler and faster, too.
More information about the cairo