[llvm] Fix #126355: Force "/utf-8" for msvc compilers (PR #126357)

Quan Zhuo via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 8 00:10:04 PST 2025


https://github.com/quanzhuo created https://github.com/llvm/llvm-project/pull/126357

The MSVC compiler determines the encoding of the source code based on the BOM of the source code when reading it. If there is no BOM, it defaults to the local encoding, which is gb2312, codepage 936, on Simplified Chinese Windows. This can cause errors such as newline characters in strings.

>From 49b18270e736397162c6a29f277455941a78116e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=85=A8=E5=8D=93?= <quanzhuo at kylinos.cn>
Date: Sat, 8 Feb 2025 16:08:35 +0800
Subject: [PATCH] Fix #126355: Force "/utf-8" for msvc compilers

The MSVC compiler determines the encoding of the source code based on
the BOM of the source code when reading it. If there is no BOM, it
defaults to the local encoding, which is gb2312, codepage 936, on
Simplified Chinese Windows. This can cause errors such as newline
characters in strings.
---
 llvm/CMakeLists.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index f5293e8663243bc..c3ace0ef21ce705 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -48,6 +48,11 @@ project(LLVM
   VERSION ${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}
   LANGUAGES C CXX ASM)
 
+# Force UTF-8 encoding for MSVC compiler. This ensures that the MSVC compilers
+# interpret source files as UTF-8 encoded.
+add_compile_options("$<$<C_COMPILER_ID:MSVC>:/utf-8>")
+add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")
+
 if (NOT DEFINED CMAKE_INSTALL_LIBDIR AND DEFINED LLVM_LIBDIR_SUFFIX)
   # Must go before `include(GNUInstallDirs)`.
   set(CMAKE_INSTALL_LIBDIR "lib${LLVM_LIBDIR_SUFFIX}")



More information about the llvm-commits mailing list