[PATCH] D37331: [ELF] Prevent crash with binary inputs with non-ascii file names
Adrian McCarthy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 31 13:20:49 PDT 2017
amccarth added a comment.
In https://reviews.llvm.org/D37331#858023, @ruiu wrote:
> > Find a unicode character whose encoding contains a byte > 128
>
> Its basically any non-ASCII character. But is it portable? I mean, for example, if Windows crt converts an command line argument into UTF-16 encoding, this test will fail due to the difference of number of underscores.
It seems unlikely the CRT is going to convert UTF-8 to UTF-16. More likely, depending on how lit issues the command, is that it'll interpret the the UTF-8 bytes as though it's in the user's code page. For the U.S., this will likely be Windows-1252. The British Pound sign in UTF-8 is 0xC2 0xA3. If you interpret those in Windows-1252, you'll see `£`, which I guess lld will convert to two underscores. On a non-Windows system, it'll still be two non-alphanumeric bytes, so I think the test should be fine.
https://reviews.llvm.org/D37331
More information about the llvm-commits
mailing list