[cfe-dev] Encoding files in utf8 on the fly?

Tom Honermann via cfe-dev cfe-dev at lists.llvm.org
Wed Sep 21 08:29:18 PDT 2016


On 9/20/2016 1:10 PM, Loïc Joly via cfe-dev wrote:
> - Convert the files on the fly while Clang loads them. This looks
> cleaner to me, even if it might be more performance intensive. From a
> quick look at the code, it looks like a good place to do this would be
> in VirtualFileSystem.cpp, with the classes File and RealFile.

Doing so can affect behavior if the source code uses characters outside 
the basic source character set.  Consider:

$ cat t.c
#include <string.h>
const char s[] = "À"; // where À is ISO-8859-1 0xC0.
int main() {
   return strlen(s);
}

$ clang t.c -o t; ./t; echo $?
t.c:2:18: warning: illegal character encoding in string literal 
[-Winvalid-source-encoding]
const char *s = "<C0>";
                  ^~~~
1 warning generated.
1

$ iconv -f iso8859-1 -t utf-8 t.c > t2.c

$ clang t2.c -o t2; ./t2; echo $?
2

Whether that is a problem or not depends on the source code.

> Can you tell me if I'm on the right track?
>
> Can you also tell me why doesn't Clang support -finput-charset? Is this
> just a question of performances, or is there another issue I'm missing?

My guess is that noone has yet been motivated enough to do the work.

Tom.




More information about the cfe-dev mailing list