[PATCH] D45550: Use GetArgumentVector to retrieve the utf-8 encoded arguments on all platforms

Rui Ueyama via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 12 11:22:33 PDT 2018


ruiu added a comment.

Stella,

Thank you for the detailed explanation! That's very helpful. I agree that we shouldn't make a guess on character encoding of command line arguments, instead, we should use Windows APIs to get them in the correct encoding and then convert them to UTF-8. That definitely looks like the right direction.

That being said, I believe that the failing lld/test/ELF/format-binary-non-ascii.s test should just be disabled on Windows. The test itself is written in UTF-8, and if your code page is, say, CP943, the test file could be converted multiple times as follows:

  When Python 3 passes the command line argument to an external command, I believe it converts arguments to the current code page, which is CP943
  The command gets arguments using GetCommandLineW and convert it from CP943 to UTF-8

The round trip doesn't guarantee that the pound sign (£; U+00A3) to be as-is. It might be converted to Full Width Pound Sign (£; U+FFE1). We might be handle this correctly in all code pages, but I don't think it's worth to do. The easiest way to avoid a mess is to just not run that particular test on Windows.


Repository:
  rLLD LLVM Linker

https://reviews.llvm.org/D45550





More information about the llvm-commits mailing list