[cfe-dev] CLang and UTF BOM characters
Ruben Van Boxem
vanboxem.ruben at gmail.com
Sat Oct 16 08:03:30 PDT 2010
Hi,
According to the UTF-8 standard, the BOM character sequence may be present
at the beginning of a file. Clang doesn't seem to support this, and produces
an error, specifying the characters as unknown tokens.
This should be fixed IMHO. The way I handle it (if input is through
std::ifstream):
inline void processBOM( std::ifstream &stream )
>
> {
>
> const unsigned char BOM[] = { 0xef, 0xbb, 0xbf };
>
> char first3chars[3];
>
> if( !stream.read( first3chars, 3 ) )
>
> throw std::runtime_error( "Unexpected end of file" );
>
>
>
if( strcmp(reinterpret_cast<const char*>(BOM), first3chars) )
>
> stream.seekg( 0, std::ios::beg ); // reset to beginning of file
>
> }
>
>
This essentially skips the BOM if present. But the solution is of course up
to you and Clang's design in this aspect.
Ruben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20101016/1ad72783/attachment.html>
More information about the cfe-dev
mailing list