[cfe-dev] CLang and UTF BOM characters

Ruben Van Boxem vanboxem.ruben at gmail.com
Sun Oct 17 05:14:42 PDT 2010


2010/10/16 Ruben Van Boxem <vanboxem.ruben at gmail.com>

> Hi,
>
> According to the UTF-8 standard, the BOM character sequence may be present
> at the beginning of a file. Clang doesn't seem to support this, and produces
> an error, specifying the characters as unknown tokens.
>
> This should be fixed IMHO. The way I handle it (if input is through
> std::ifstream):
>
> inline void processBOM( std::ifstream &stream )
>>
>> {
>>
>>     const unsigned char BOM[] = { 0xef, 0xbb, 0xbf };
>>
>>     char first3chars[3];
>>
>>     if( !stream.read( first3chars, 3 ) )
>>
>>         throw std::runtime_error( "Unexpected end of file" );
>>
>>
>>
>     if( strcmp(reinterpret_cast<const char*>(BOM), first3chars) )
>>
>>         stream.seekg( 0, std::ios::beg ); // reset to beginning of file
>>
>> }
>>
>>
> This essentially skips the BOM if present. But the solution is of course up
> to you and Clang's design in this aspect.
>
> Ruben
>


I actually meant strncmp(...,...,3) by the way :)

Ruben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20101017/4356a8da/attachment.html>


More information about the cfe-dev mailing list