r195877 - strip UTF-8 BOM in -frewrite-includes (PR#15664)

Alp Toker alp at nuanti.com
Wed Nov 27 16:29:29 PST 2013

On 27/11/2013 23:58, Lubos Lunak wrote:
> On Wednesday 27 of November 2013, Alp Toker wrote:
>> On 27/11/2013 21:14, Lubos Lunak wrote:
>>> Author: llunak
>>> Date: Wed Nov 27 15:14:43 2013
>>> New Revision: 195877
> ...
>>> ===== --- cfe/trunk/test/Frontend/Inputs/rewrite-includes-bom.h (added)
>>> +++ cfe/trunk/test/Frontend/Inputs/rewrite-includes-bom.h Wed Nov 27
>>> 15:14:43 2013 @@ -0,0 +1 @@
>>> +// This file starts with UTF-8 BOM marker.
>>> Added: cfe/trunk/test/Frontend/rewrite-includes-bom.c
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Frontend/rewrite-inclu
>>> des-bom.c?rev=195877&view=auto
>>> =========================================================================
>>> ===== --- cfe/trunk/test/Frontend/rewrite-includes-bom.c (added)
>>> +++ cfe/trunk/test/Frontend/rewrite-includes-bom.c Wed Nov 27 15:14:43
>>> 2013 @@ -0,0 +1,4 @@
>>> +// RUN: %clang -E -frewrite-includes -I %S/Inputs %s -o - | %clang
>>> -fsyntax-only -Xclang -verify -x c - +// expected-no-diagnostics
>>> +
>>> +#include "rewrite-includes-bom.h"
>> Hi Lubos,
>> I don't think it's reasonable to check in this test in its current
>> state, given that it passes with or without the BOM. If the invisible
>> BOM happens to get removed by automatic encoding transformations in VCS
>> or accidental code cleanup, it would give a false sense that the feature
>> is working when it could in fact have regressed.
>   Actually, most tests do no longer work if they have accidentally their input
> modified to no longer include whatever it is that should find a regression,
> and it's a question how likely it is to happen for any of them.
>> Leaving the feature untested until you get round to writing a better
>> test, perhaps checking output for presence of BOM with something like
>> grep, is preferable for that reason.
>   Now only if that were so easy:
>> @@ -1,4 +1,5 @@
>> -// RUN: %clang -E -frewrite-includes -I %S/Inputs %s -o - | %clang
>> -fsyntax-only -Xclang -verify -x c -
>> +// RUN: grep '^\xEF\xBB\xBF'  %S/Inputs/rewrite-includes-bom.h
>> +// RUN: %clang_cc1 -E -frewrite-includes -I %S/Inputs %s -o - | %clang_cc1
>> -fsyntax-only -verify -x c - | not grep '\xEF\xBB\xBF'
>   This test doesn't fail with the BOM removed either. With GNU grep 4.12 that I
> have here, the first grep never matches regardless of whether the BOM is
> there or not (which additionally means that even if the grep worked fine, the
> test still wouldn't fail for a regression).

This was working on my OS X grep (2.5.1-FreeBSD) and bash but I've 
committed a more portable syntax which should hopefully fix other platforms.

In the last weeks I fixed a number of tests in LLVM and clang that 
clearly used to work at some point, but have become complete no-ops over 
time due to "cleanups" like reformatting made with the best intentions 
that changed significant whitespace or removed hidden characters.

Just because it's hard doesn't mean we shouldn't aim for robustness, 
which we've achieved now for this test (after a couple of false starts!)

>   Also, the removal of the BOM is already handled by the -fsyntax-only step
> (which fails if the BOM is in the middle of the input), so the last grep
> always checks something where it's never present.

Right. One of the two latter checks can be removed now. Personally I 
wouldn't depend on clang diagnosing on BOM as you were -- this is an 
unrelated feature that may change, say if the diagnostic were to be 
moved into the set that's ignored by default etc. The test would 
silently become a no-op.

By comparison, the grep is absolutely clear in its intent. It'll still 
be valid and self-documenting 15 years down the line when neither of us 
are around on the project to remember what this was trying to test :-)


>> Also, your commit email address is unreachable. Please update this so we
>> can contact you for post-commit review:
>   I already had asked for that before the commit. It's just been a long time
> since I last used Subversion.

the browser experts

More information about the cfe-commits mailing list