[PATCH] D157986: [Cmake] Make sure MSVC knows LLVM source files are UTF-8 encoded

Tom Honermann via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 6 12:09:37 PDT 2023


tahonermann added a reviewer: tahonermann.
tahonermann added a comment.

I looked at a bunch of the test failures. Most appear to have failed due to a failed attempt to match non-ASCII characters like line drawing characters. It seems that there are some mismatched encoding expectations going on and that non-ASCII characters are sometimes expected to match '?' and sometimes expected to match an escaped representation. For example, the output for test "./ClangdTests.exe/26/38" contains the following (`\xE2\x86\x92` corresponds to the UTF-8 representation of "→" (U+2192 RIGHTWARDS ARROW) and "?" is presumably a substituted replacement character).

  → ret_type (aka can_ret_type)
  ...
  -? ret_type (aka can_ret_type)
  +\xE2\x86\x92 ret_type (aka can_ret_type)

Since Clang only supports UTF-8 as the execution encoding, perhaps we should do likewise for MSVC and build everything with `/utf-8`. It looks like you tried that already and the result wasn't good? It might be worth trying again, but with additional options to embed a manifest that sets the active code page to UTF-8 (https://devblogs.microsoft.com/oldnewthing/20220531-00/?p=106697); though that requires at least Windows 10 Version 1903. Do we document a minimum Windows version requirement for building and running LLVM/Clang?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157986/new/

https://reviews.llvm.org/D157986



More information about the llvm-commits mailing list