[llvm] r329468 - Windows needs the current codepage instead of utf8 sometimes

Aaron Smith via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 10 20:03:33 PDT 2018


Updating llvm-mc, ld-lld, and llvm-readobj to use GetArgumentVector looks like it would fix the failing test in lld -- format-binary-non-ascii.s.  I'm not sure if there are other tools that need to change for other tests. Will have look into it more. 

Aaron

On 4/9/18, 10:36 PM, "Friedman, Eli" <efriedma at codeaurora.org> wrote:

    We have an API that C++ code can to use to correctly retrieve the 
    Unicode command-line to a process: sys::Process::GetArgumentVector.  (On 
    Unix systems, it just uses argv, but on Windows, it uses GetCommandLineW 
    and converts the result to UTF-8.)  It looks like llvm-mc in particular 
    doesn't use that API, but it should.
    
    If llvm-mc is fixed to use GetArgumentVector, and lit is invoking 
    llvm-mc with a Unicode command-line, everything should just work without 
    the current codepage ever becoming involved.
    
    -Eli
    
    On 4/6/2018 6:12 PM, Aaron Smith wrote:
    > Hi Eli,
    >
    > Yes that is unfortunate. The testing infrastructure is very delicate when it comes to encoding and python versions. See https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD43165&data=02%7C01%7Caaron.smith%40microsoft.com%7Cd61cee3574bd45641eff08d59e61e5ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589065666902270&sdata=MKe%2BSp6B0O9TqJcobk2DX2zwRkcR1dvgJDwkr9riBxI%3D&reserved=0 for more discussion. If you have other ideas, please let us know.
    >
    > Aaron
    >
    > On 4/6/18, 6:08 PM, "Friedman, Eli" <efriedma at codeaurora.org> wrote:
    >
    >      On 4/6/2018 5:32 PM, Aaron Smith via llvm-commits wrote:
    >      > Author: asmith
    >      > Date: Fri Apr  6 17:32:59 2018
    >      > New Revision: 329468
    >      >
    >      > -  // Just use the caller's original path.
    >      > -  return UTF8ToUTF16(Path8Str, Path16);
    >      > +  // Path8Str now contains the full path or the original path
    >      > +  // If the conversion from UTF8 to UTF16 fails because of ERROR_NO_UNICODE_TRANSLATION,
    >      > +  // we also try using the current code page before giving up
    >      > +  auto ec = UTF8ToUTF16(Path8Str, Path16);
    >      > +  if (ec == mapWindowsError(ERROR_NO_UNICODE_TRANSLATION)) {
    >      > +    ec = CurCPToUTF16(Path8Str, Path16);
    >      > +  }
    >      > +  return ec;
    >      
    >      This change is really dubious: we should not be charset sniffing in a
    >      compiler.  This will cause bugs where code appears to work sometimes,
    >      but randomly breaks when we guess the encoding incorrectly.
    >      
    >      -Eli
    >      
    >      --
    >      Employee of Qualcomm Innovation Center, Inc.
    >      Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
    >      
    >      
    >
    
    -- 
    Employee of Qualcomm Innovation Center, Inc.
    Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
    
    



More information about the llvm-commits mailing list