[llvm] r329468 - Windows needs the current codepage instead of utf8 sometimes
Aaron Smith via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 10 20:03:33 PDT 2018
Updating llvm-mc, ld-lld, and llvm-readobj to use GetArgumentVector looks like it would fix the failing test in lld -- format-binary-non-ascii.s. I'm not sure if there are other tools that need to change for other tests. Will have look into it more.
Aaron
On 4/9/18, 10:36 PM, "Friedman, Eli" <efriedma at codeaurora.org> wrote:
We have an API that C++ code can to use to correctly retrieve the
Unicode command-line to a process: sys::Process::GetArgumentVector. (On
Unix systems, it just uses argv, but on Windows, it uses GetCommandLineW
and converts the result to UTF-8.) It looks like llvm-mc in particular
doesn't use that API, but it should.
If llvm-mc is fixed to use GetArgumentVector, and lit is invoking
llvm-mc with a Unicode command-line, everything should just work without
the current codepage ever becoming involved.
-Eli
On 4/6/2018 6:12 PM, Aaron Smith wrote:
> Hi Eli,
>
> Yes that is unfortunate. The testing infrastructure is very delicate when it comes to encoding and python versions. See https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD43165&data=02%7C01%7Caaron.smith%40microsoft.com%7Cd61cee3574bd45641eff08d59e61e5ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589065666902270&sdata=MKe%2BSp6B0O9TqJcobk2DX2zwRkcR1dvgJDwkr9riBxI%3D&reserved=0 for more discussion. If you have other ideas, please let us know.
>
> Aaron
>
> On 4/6/18, 6:08 PM, "Friedman, Eli" <efriedma at codeaurora.org> wrote:
>
> On 4/6/2018 5:32 PM, Aaron Smith via llvm-commits wrote:
> > Author: asmith
> > Date: Fri Apr 6 17:32:59 2018
> > New Revision: 329468
> >
> > - // Just use the caller's original path.
> > - return UTF8ToUTF16(Path8Str, Path16);
> > + // Path8Str now contains the full path or the original path
> > + // If the conversion from UTF8 to UTF16 fails because of ERROR_NO_UNICODE_TRANSLATION,
> > + // we also try using the current code page before giving up
> > + auto ec = UTF8ToUTF16(Path8Str, Path16);
> > + if (ec == mapWindowsError(ERROR_NO_UNICODE_TRANSLATION)) {
> > + ec = CurCPToUTF16(Path8Str, Path16);
> > + }
> > + return ec;
>
> This change is really dubious: we should not be charset sniffing in a
> compiler. This will cause bugs where code appears to work sometimes,
> but randomly breaks when we guess the encoding incorrectly.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>
>
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the llvm-commits
mailing list