[PATCH] D37331: [ELF] Prevent crash with binary inputs with non-ascii file names

Adrian McCarthy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 31 13:20:49 PDT 2017


amccarth added a comment.

In https://reviews.llvm.org/D37331#858023, @ruiu wrote:

> > Find a unicode character whose encoding contains a byte > 128
>
> Its basically any non-ASCII character. But is it portable? I mean, for example, if Windows crt converts an command line argument into UTF-16 encoding, this test will fail due to the difference of number of underscores.


It seems unlikely the CRT is going to convert UTF-8 to UTF-16.  More likely, depending on how lit issues the command, is that it'll interpret the the UTF-8 bytes as though it's in the user's code page.  For the U.S., this will likely be Windows-1252.  The British Pound sign in UTF-8 is 0xC2 0xA3.  If you interpret those in Windows-1252, you'll see `£`, which I guess lld will convert to two underscores.  On a non-Windows system, it'll still be two non-alphanumeric bytes, so I think the test should be fine.


https://reviews.llvm.org/D37331





More information about the llvm-commits mailing list