[cfe-dev] Making Clang support UTF-16 input?
Hans Wennborg via cfe-dev
cfe-dev at lists.llvm.org
Thu Aug 16 07:17:18 PDT 2018
When trying out the new LLVM Visual Studio integration extension [1]
that now supports VS 2017, we learned that at some point, VS changed
the editor to encode new project files as utf-16 by default [2].
Since Clang doesn't support utf-16 input, this creates a bad
experience for users trying out Clang on a new Visual Studio project.
Should we make Clang support utf-16 input?
Nico pointed out there was a patch [3] by Scott Conger a long time ago
to support -finput-charset=.
That seems a bit more ambitious than what's necessary here. What I was
thinking was something like, if -fallow-utf16t is passed (maybe a
clang-cl default), instead of erroring out on a byte-order mark, Clang
would try to convert to utf-8.
Scott's patch hooked into FileManager, but that's also used for PCH
and such non-source files so makes me a little nervous. Maybe
SourceManager would be a better place. One idea would be to do this in
SourceManager::ContentCache::getBuffer where byte-order-markers are
currently diagnosed: instead of emitting an error, if the flag is set
we'd convert to utf-8 and swap out the buffer.
The complexity around Clang's virtual filesystem, remapped files and
stuff makes me a little nervous though. Are there more gotchas, or
does this sound like a good way to do it?
Thanks,
Hans
1. https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain
2. https://developercommunity.visualstudio.com/content/problem/169566/visual-studio-2017-creates-utf-16-source-code-file.html
3. http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20110711/044059.html
More information about the cfe-dev
mailing list