[cfe-dev] Token lookahead without the preprocessor

Richard Smith richard at metafoo.co.uk
Tue Jun 26 00:26:21 PDT 2012


On Mon, Jun 25, 2012 at 8:51 PM, Jordan Rose <jordan_rose at apple.com> wrote:

> Hi, all. I've been trying to come up with a useful recovery for this case
> (<rdar://problem/11602405> for Apple folks):
>
> void foo();
> {
>        // note the spurious semicolon above
> }
>
> The trouble is, having a semicolon there is a perfectly good way to end a
> declaration. It's clear that if there's a brace on the next line, it was
> actually supposed to be a definition (because C/C++ don't have top-level
> braces). But we get in trouble in this case (from
> test/CodeGen/pragma-weak.c):
>
> void __both2(void);
> void both2(void) __attribute((alias("__both2"))); // first, wins
> #pragma weak both2 = __both2
> void __both2(void) {}
> // CHECK: @both2 = alias void ()* @__both2
> // CHECK: define void @__both2()
>
> The lookahead after the semicolon has to go all the way to the next 'void'
> to get another token, and meanwhile the Lexer and Preprocessor have seen
> and recorded the #pragma weak.
>
> There are similar problems in
> test/SemaCXX/warn-thread-safety-analysis.cpp, though I haven't specifically
> tracked them down.
>
> Any ideas on what's the right thing to do here? I'd be fine with "there's
> a preprocessing directive in the way; don't bother" or "the next token is
> 'void' but you're gonna have to re-Lex from where you are" but I don't
> think we have a good way to do either one. (Raw mode /almost/ works except
> I'm not sure of the right way to go into raw mode from Parser.)


Hi Jordy,

This is PR10101, and was fixed in r145372, but the fix was backed out due
to the #pragma weak (and, at the time, #pragma visibility) issue. The
problem is that the implementation of this pragma is incorrect, since it
takes effect when the pragma is lexed, rather than when it is parsed, and
the point at which the pragma occurs has a semantic impact. We shouldn't be
hacking around that in the parser by avoiding lookahead; the right fix is
for the lexer to produce an annotation token when it encounters such a
pragma, as it does for #pragma visibility, #pragma pack, and #pragma unused.

This issue also makes our parsing of "#pragma weak" accept code which GCC
rejects (though I'm hesitant to call it an accepts-invalid since I can't
find a precise spec for this pragma): GCC (as far as I can determine) only
accepts the pragma in places where it would parse a declaration.

Incidentally, I wonder whether it'd make sense to provide a more general
framework for such cases, rather than adding ad-hoc pragma annotations.
Perhaps, for all pragmas which can only appear in specific places in the
grammar, we could lex them as a tok::annot_pragma followed by the tokens in
the pragma and an tok::eod, and perform the pragma parsing in the parser.

Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120626/606c02de/attachment.html>


More information about the cfe-dev mailing list