[cfe-dev] CLang and UTF BOM characters
Ruben Van Boxem
vanboxem.ruben at gmail.com
Sun Oct 17 05:14:42 PDT 2010
2010/10/16 Ruben Van Boxem <vanboxem.ruben at gmail.com>
> Hi,
>
> According to the UTF-8 standard, the BOM character sequence may be present
> at the beginning of a file. Clang doesn't seem to support this, and produces
> an error, specifying the characters as unknown tokens.
>
> This should be fixed IMHO. The way I handle it (if input is through
> std::ifstream):
>
> inline void processBOM( std::ifstream &stream )
>>
>> {
>>
>> const unsigned char BOM[] = { 0xef, 0xbb, 0xbf };
>>
>> char first3chars[3];
>>
>> if( !stream.read( first3chars, 3 ) )
>>
>> throw std::runtime_error( "Unexpected end of file" );
>>
>>
>>
> if( strcmp(reinterpret_cast<const char*>(BOM), first3chars) )
>>
>> stream.seekg( 0, std::ios::beg ); // reset to beginning of file
>>
>> }
>>
>>
> This essentially skips the BOM if present. But the solution is of course up
> to you and Clang's design in this aspect.
>
> Ruben
>
I actually meant strncmp(...,...,3) by the way :)
Ruben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20101017/4356a8da/attachment.html>
More information about the cfe-dev
mailing list