[cfe-dev] Encoding files in utf8 on the fly?

Hubert Tong via cfe-dev cfe-dev at lists.llvm.org
Wed Sep 21 11:39:36 PDT 2016


On Wed, Sep 21, 2016 at 11:29 AM, Tom Honermann via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 9/20/2016 1:10 PM, Loïc Joly via cfe-dev wrote:
> > - Convert the files on the fly while Clang loads them. This looks
> > cleaner to me, even if it might be more performance intensive. From a
> > quick look at the code, it looks like a good place to do this would be
> > in VirtualFileSystem.cpp, with the classes File and RealFile.
>
> Doing so can affect behavior if the source code uses characters outside
> the basic source character set.  Consider:
>
> $ cat t.c
> #include <string.h>
> const char s[] = "À"; // where À is ISO-8859-1 0xC0.
> int main() {
>    return strlen(s);
> }
>
> $ clang t.c -o t; ./t; echo $?
> t.c:2:18: warning: illegal character encoding in string literal
> [-Winvalid-source-encoding]
> const char *s = "<C0>";
>                   ^~~~
> 1 warning generated.
> 1
>
> $ iconv -f iso8859-1 -t utf-8 t.c > t2.c
>
> $ clang t2.c -o t2; ./t2; echo $?
> 2
>
> Whether that is a problem or not depends on the source code.
>
> > Can you tell me if I'm on the right track?
> >
> > Can you also tell me why doesn't Clang support -finput-charset? Is this
> > just a question of performances, or is there another issue I'm missing?
>
> My guess is that noone has yet been motivated enough to do the work.
>
There is interest in pursuing this (and some ideas have been discussed);
however, as your code points out, support for
-finput-charset
and
-fexec-charset
is not a trivial endeavour.

-- HT


> Tom.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160921/d0dcfd91/attachment.html>


More information about the cfe-dev mailing list