[PATCH] D37491: [Preamble] Fixed preamble breaking with BOM presence (and particularly, fluctuating BOM presence)

Cameron via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 7 06:30:22 PDT 2017


cameron314 added a comment.

> Maybe there's a third option option to remove the BOM from the buffer before passing it to clang?
>  Could you elaborate on your use-case a little more? Is there no way to consistently always pass buffers either with or without BOM?
>  Out of two options you mention discarding preamble on BOM changes seems like an easy option that is both correct and won't make a difference in performance since BOM rarely changes.
>  Looking at your use-case, it sounds like you'll only have 1 extra reparse of preamble, which is probably fine.

In my particular use case, when the file is remapped by the IDE, there's never a BOM. But when it's not remapped, the real file may or may not have a BOM. Since the file goes back and forth between mapped and unmapped depending on whether it's saved, the BOM presence can change quite frequently, and we don't really have control over it (the BOM can change on disk too). This is a common use case for anyone integrating clang/libclang into an IDE; the rarity of UTF-8 BOMs on platforms other than Windows probably obscured this until now.

I think since we can handle the changing BOM presence in the preamble gracefully, we should. I'll draft a patch that does the slicing correctly so that the offsets are always valid.

> Sure, it's not something clang supports, it's an edge-case when clang receives "malformed" input. Does lexer only skip utf-8 BOM, but not other versions of BOM?
>  But you're right, it's highly unlikely anything will break in that case.

Ah, I see. Yes, the lexer only skips a UTF-8 BOM, but I seem to recall seeing some code that detects BOMs in other encodings and emits an error (in the driver, possibly?).



================
Comment at: unittests/Frontend/PchPreambleTest.cpp:190
+
+  ASSERT_TRUE(ReparseAST(AST));
+  ASSERT_FALSE(AST->getDiagnostics().hasErrorOccurred());
----------------
ilya-biryukov wrote:
> cameron314 wrote:
> > ilya-biryukov wrote:
> > > We're not really testing that preamble was reused.
> > > Maybe return a flag from `ASTUnit::Reparse` to indicate if preamble was reused and check it here?
> > We are; if it wasn't reused, the header would have been opened again and the last assert on `GetFileReadCount` below would fail.
> Missed that, thanks. Looks good.
> Maybe add a comment explicitly noting that?
Sure, will do.


https://reviews.llvm.org/D37491





More information about the cfe-commits mailing list