[PATCH] Windows: Add support for unicode command lines

Alp Toker alp at nuanti.com
Fri Oct 4 10:52:29 PDT 2013

Hi David,

The switch to wide functions makes sense. Maybe we can make the changes
a little lighter..

+  SmallVector<wchar_t, 128> ProgramUtf16;
+  if (error_code ec = windows::UTF8ToUTF16(Program, ProgramUtf16)) {
+    SetLastError(ec.value());
+    MakeErrMsg(ErrMsg,
+               std::string("Unable to convert application name to
+    return false;
+  }
+  SmallVector<wchar_t, 128> CommandUtf16;
+  if (error_code ec = windows::UTF8ToUTF16(command.get(), CommandUtf16)) {
+    SetLastError(ec.value());
+    MakeErrMsg(ErrMsg,
+               std::string("Unable to convert command-line to UTF-16"));
+    return false;
+  }

There are quite a few of these.

LLVM doesn't generally report UTF8ToUTF16() failure beyond propagating
the error. Is conversion failure likely and user-actionable enough to
warrant individually tailored messages? Maybe a general "UTF-16
conversion failure" message lifted into an inline function would suffice?

Also, UTF8ToUTF16() seems to already exit with last error set. Did you
find a need to explicitly SetLastError() after the call?

So, if each of these conversions can be reduced to one or two LoC I
think it'll make your patch more readable, and also make it more likely
for other code in LLVM to follow your example and switch to the wide API.

Maybe something like this, and then you could chain the conversions as

SmallVector<wchar_t, 128> ProgramUtf16, CommandUtf16;
if (!ConvertToUTF16(Program, ProgramUtf16, ErrMsg) ||
    !ConvertToUTF16(Command, CommandUtf16, ErrMsg))
  return false;


On 04/10/2013 13:14, David Majnemer wrote:
> Hi aaron.ballman, Bigcheese, rnk, ruiu,
> The MSVCRT deliberately sends main() code-page specific characters.
> This isn't too useful to LLVM as we end up converting the arguments to
> UTF-16 and subsequently attempt to use the result as, for example, a
> file name.  Instead, we need to have the ability to access the Unicode
> command line and transform it to UTF-8.
> This has the distinct advantage over using the MSVC-specific wmain()
> function as our entry point because:
>  - It doesn't work on cygwin.
>  - It only work on MinGW with caveats and only then on certain versions.
>  - We get to keep our entry point as main(). :)
> N.B.  This patch includes fixes to other parts of lib/Support/Windows
> s.t. we would be able to take advantage of getting the Unicode paths.
> E.G.  clang spawning clang -cc1 would want to give it Unicode arguments.
> http://llvm-reviews.chandlerc.com/D1834
> Files:
>   autoconf/configure.ac
>   cmake/config-ix.cmake
>   cmake/modules/LLVM-Config.cmake
>   cmake/modules/TableGen.cmake
>   configure
>   include/llvm/Config/config.h.cmake
>   include/llvm/Config/config.h.in
>   include/llvm/Support/Process.h
>   lib/Support/Unix/Process.inc
>   lib/Support/Windows/DynamicLibrary.inc
>   lib/Support/Windows/Path.inc
>   lib/Support/Windows/Process.inc
>   lib/Support/Windows/Program.inc
>   lib/Support/Windows/Signals.inc
>   lib/Support/Windows/Windows.h
>   projects/sample/autoconf/configure.ac
>   projects/sample/configure
>   utils/FileCheck/CMakeLists.txt
>   utils/FileUpdate/CMakeLists.txt
>   utils/not/CMakeLists.txt
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

the browser experts

More information about the llvm-commits mailing list