[cfe-dev] Encoding files in utf8 on the fly?

Loïc Joly via cfe-dev cfe-dev at lists.llvm.org
Tue Sep 20 10:09:20 PDT 2016


As far as I understand, Clang assumes that the source code it reads is 
encoded in UTF-8. Please, let me know if I'm wrong.

I'd like to use Clang to analyse some code base that can be encoded in 
about any encoding possible. I was thinking about two options to do this:

- Convert all files to UTF-8 before analysis. This might prove 
difficult, because I can't know beforehand what files will be opened by 
Clang. Moreover, I'd prefer not to modify the input source code if possible

- Convert the files on the fly while Clang loads them. This looks 
cleaner to me, even if it might be more performance intensive. From a 
quick look at the code, it looks like a good place to do this would be 
in VirtualFileSystem.cpp, with the classes File and RealFile.

Can you tell me if I'm on the right track?

Can you also tell me why doesn't Clang support -finput-charset? Is this 
just a question of performances, or is there another issue I'm missing?

Thank you for your help,


Loïc Joly

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.

More information about the cfe-dev mailing list