[cfe-dev] CLang and UTF BOM characters

Ruben Van Boxem vanboxem.ruben at gmail.com
Sat Oct 16 08:03:30 PDT 2010


Hi,

According to the UTF-8 standard, the BOM character sequence may be present
at the beginning of a file. Clang doesn't seem to support this, and produces
an error, specifying the characters as unknown tokens.

This should be fixed IMHO. The way I handle it (if input is through
std::ifstream):

inline void processBOM( std::ifstream &stream )
>
> {
>
>     const unsigned char BOM[] = { 0xef, 0xbb, 0xbf };
>
>     char first3chars[3];
>
>     if( !stream.read( first3chars, 3 ) )
>
>         throw std::runtime_error( "Unexpected end of file" );
>
>
>
    if( strcmp(reinterpret_cast<const char*>(BOM), first3chars) )
>
>         stream.seekg( 0, std::ios::beg ); // reset to beginning of file
>
> }
>
>
This essentially skips the BOM if present. But the solution is of course up
to you and Clang's design in this aspect.

Ruben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20101016/1ad72783/attachment.html>


More information about the cfe-dev mailing list