[cfe-dev] Making Clang support UTF-16 input?

Hans Wennborg via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 16 07:17:18 PDT 2018


When trying out the new LLVM Visual Studio integration extension [1]
that now supports VS 2017, we learned that at some point, VS changed
the editor to encode new project files as utf-16 by default [2].

Since Clang doesn't support utf-16 input, this creates a bad
experience for users trying out Clang on a new Visual Studio project.

Should we make Clang support utf-16 input?

Nico pointed out there was a patch [3] by Scott Conger a long time ago
to support -finput-charset=.

That seems a bit more ambitious than what's necessary here. What I was
thinking was something like, if -fallow-utf16t is passed (maybe a
clang-cl default), instead of erroring out on a byte-order mark, Clang
would try to convert to utf-8.

Scott's patch hooked into FileManager, but that's also used for PCH
and such non-source files so makes me a little nervous. Maybe
SourceManager would be a better place. One idea would be to do this in
SourceManager::ContentCache::getBuffer where byte-order-markers are
currently diagnosed: instead of emitting an error, if the flag is set
we'd convert to utf-8 and swap out the buffer.

The complexity around Clang's virtual filesystem, remapped files and
stuff makes me a little nervous though. Are there more gotchas, or
does this sound like a good way to do it?

Thanks,
Hans


 1. https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain
 2. https://developercommunity.visualstudio.com/content/problem/169566/visual-studio-2017-creates-utf-16-source-code-file.html
 3. http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20110711/044059.html



More information about the cfe-dev mailing list