[cfe-commits] [PATCH] GCC Preprocessor Assertions

Sat Dec 3 14:33:20 PST 2011

Hi Doug,

Thanks for the detailed feedback on the code style and control flow; all 
your comments seem very reasonable and I will go and rework the patch. I 
will see if I can add another patch to do the 
serialization/de-serialization.

Andrew

On 04-Dec-11 06:32, Douglas Gregor wrote:
> On Dec 1, 2011, at 11:46 PM, Andrew Craik wrote:
>
>> Hello,
>>
>> At the LLVM Developers' Meeting a couple of weeks ago I mentioned that I had a patch to add support to the pre-processor to handle GCC-style pre-processor assertions.
>> Please find attached two patches:
>> pp_assert_impl.patch - this adds handling of assertions to the pre-processor
>> pp_assert_default.patch - this adds default cpu and system assertions to Targets.cpp
>>
>> I look forward to hearing from the community and hopefully getting these patches integrated.
>
> Thanks. A bunch of comments below, but generally things are look very good!
>
> Starting with pp_assert_impl.patch:
>
> Index: clang/include/clang/Lex/Preprocessor.h
> ===================================================================
> --- clang.orig/include/clang/Lex/Preprocessor.h	2011-11-30 11:39:19.000000000 +1000
> +++ clang/include/clang/Lex/Preprocessor.h	2011-12-02 12:23:53.246940900 +1000
> @@ -15,6 +15,7 @@
>   #define LLVM_CLANG_LEX_PREPROCESSOR_H
>
>   #include "clang/Lex/MacroInfo.h"
> +#include "clang/Lex/AssertionInfo.h"
>   #include "clang/Lex/Lexer.h"
>   #include "clang/Lex/PTHLexer.h"
>   #include "clang/Lex/PPCallbacks.h"
> @@ -240,6 +241,10 @@
>     /// to the actual definition of the macro.
>     llvm::DenseMap<IdentifierInfo*, MacroInfo*>  Macros;
>
> +  /// Assertions - For each IdentifierInfo that has appeared as an asserted
> +  /// predicate, we store the values for which it was asserted
> +  llvm::DenseMap<IdentifierInfo*, std::vector<AssertionInfo*>  >  AssertedValues;
>
> I'm guessing that the common case is that there is only one asserted value for each assertion, so I suggest replacing the std::vector with a UsuallyTinyPtrVector (which will need to be boosted from AST to Basic).
>
> @@ -440,6 +459,30 @@
>     macro_iterator macro_begin(bool IncludeExternalMacros = true) const;
>     macro_iterator macro_end(bool IncludeExternalMacros = true) const;
>
> +  /// AddAssertionValue - assert the specified identifier to the
> +  /// set of asserted values for the given assertion key
> +  void AddAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
> +
> +  /// IsAsserted - test if the specified identifier has been asserted as a
> +  /// value of the given assertion key returns true iff it has
> +  bool IsAsserted(IdentifierInfo *II, AssertionInfo *AI);
> +
> +  /// RemoveAssertionValue - remove the specified identifier from the set of
> +  /// asserted values for the given assertion key
> +  void RemoveAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
> +
> +  /// RemoveAssertion - remove the specified assertion and all of its asserted
> +  /// values
> +  void RemoveAssertion(IdentifierInfo *II);
>
> A quick request here… the LLVM coding standards clarified the naming of functions a few months ago (http://llvm.org/docs/CodingStandards.html#ll_naming), so please name these addAssertionValue, isAsserted, and so on.
> The same comment applies throughout the patch, because we want new code to follow this convention and existing code to eventually migrate over.
>
>
> +    void Destory() {
> +      ValueTokens.clear();
> +      this->~AssertionInfo();
> +    }
>
> Typo "Destory", propagated throughout.
>
> +/// ReadAssertionPredicate - Lex and validate an assertion predicate, which
> +/// occurs after an #assert or #unassert. This sets the token kind to eod and
> +/// discards the rest of the line if the assert is invalid.
> +/// isAssertUnassert is 1 if this is due to an #assert, 2 if #unassert
> +/// directive, 0 if it is something else
> +void Preprocessor::ReadAssertionPredicate(Token&AssertionPredTok,
> +    char isAssertUnassert) {
>
> Please use an enumeration type rather than 0/1/2 here.
>
> Also, why not indicate failure by returning 'true' and success by returning 'false', as we do elsewhere in Clang?
>
> +  IdentifierInfo *II = AssertionPredTok.getIdentifierInfo();
> +  if (II == 0) {
> +    bool Invalid = false;
> +    std::string Spelling = getSpelling(AssertionPredTok,&Invalid);
> +    if (Invalid) {
> +      return;
> +    }
> +
> +    const IdentifierInfo&Info = Identifiers.get(Spelling);
> +    if (Info.isCPlusPlusOperatorKeyword()) {
> +      Diag(AssertionPredTok, diag::err_pp_operator_used_as_assert_predicate)
> +<<  Spelling;
> +    } else {
> +      Diag(AssertionPredTok, diag::err_pp_predicate_not_identifier);
> +    }
> +    // Fall through on error.
> +  } else if (isAssertUnassert&&  II->getPPKeywordID() == tok::pp_assert) {
> +    // Error if using assert as a predicate since using it would require
> +    // #assert
> +    Diag(AssertionPredTok, diag::err_assert_predicate_name);
> +  } else {
> +    // Okay, we got a good identifier node. Return it.
> +    return;
> +  }
>
> If you did switch the return value to 'bool', this code could be de-nested and clarified a bit:
>
> if (IdentifierInfo *II = AssertionPredTok.getIdentifierInfo()) {
>    if (isAssertUnassert&&  II->getPPKeywordID() == tok::pp_assert)
>      return Diag(AssertionPredTok, diag::err_assert_predicate_name);
>
>    return false;
> }
>
> // diagnose non-identifiers.
>
> +//===----------------------------------------------------------------------===//
> +// Preprocessor Assertion Directive Handling.
> +//===----------------------------------------------------------------------===//
> +void Preprocessor::HandleAssertDirective(Token&DTok) {
> +  Token AssertionPredicateTok;
> +  ReadAssertionPredicate(AssertionPredicateTok, 1);
> +
> +  if (AssertionPredicateTok.is(tok::eod)) {
> +    return;
> +  }
> +
> +  AssertionInfo *AI =
> +    AllocateAssertionInfo(AssertionPredicateTok.getLocation());
> +  Token Tok;
> +  LexUnexpandedToken(Tok);
> +  ReadAssertionValue(Tok, AI);
> +
> +  // add to AI
> +  if (AI != 0) {
>
> I find this interface to ReadAssertionValue to be a bit odd. How about having ReadAssertionValue lex all of the tokens it needs to, then return an allocated AssertionInfo* on success or null to indicate failure? It would simplify control flow for its callers.
>
> +void Preprocessor::HandleUnassertDirective(Token&DTok) {
> +  Token AssertionPredicateTok;
> +  ReadAssertionPredicate(AssertionPredicateTok, 2);
> +
> +  // check to see if we read eod which indicates an error has occurred and
> +  // an appropriate diagnostic has already been generated
> +  if (AssertionPredicateTok.is(tok::eod)) {
> +    return;
> +  }
> +
> +  Token Tok;
> +  LexUnexpandedToken(Tok);
> +
> +  IdentifierInfo *II = AssertionPredicateTok.getIdentifierInfo();
> +
> +  // we have a valid identifier so we need to create an AssertionInfo to hold
> +  // the value to associate with the predicate
> +  assertion_iterator iter = AssertedValues.find(II);
> +
> +  // we don't have an entry in the assertion table for the predicate, this is a
> +  // noop so escape early
> +  if (iter == assertion_end()) {
> +    return;
> +  }
>
> Shouldn't we still parse the rest of the #unassert to make sure it's well-formed? Or, at the very least, skip to the end of the directive?
>
> +void Preprocessor::ReadAssertionValue(Token&Tok, AssertionInfo *AI) {
> +  if (Tok.isNot(tok::l_paren)) {
> +    // emit diag
> +    ReleaseAssertionInfo(AI);
> +    AI = 0;
> +    return DiscardUntilEndOfDirective();
> +  }
>
> You're missing the diagnostic here.
>
> +  while (true) {
> +    LexUnexpandedToken(Tok);
> +    if (Tok.is(tok::r_paren) || Tok.is(tok::eod)) {
> +      break;
> +    }
> +    AI->AddToken(Tok);
> +  }
>
> Could you add a comment saying that it's intentional that parentheses don't nest?It surprised me, but I see that's the documented behavior, and I don't want someone coming through later and "fixing" it.
>
> +  // check for the closing ), if we don't have it relase the AssetionInfo and
> +  // discard the rest of the directive
> +  if (!Tok.is(tok::r_paren)) {
> +    ReleaseAssertionInfo(AI);
> +    AI = 0;
> +    return DiscardUntilEndOfDirective();
> +  }
>
> Comment typo "relase". Also, shouldn't we emit a diagnostic here?
>
> +  // check there is nothing after the closing ), if we have anything release
> +  // the Assertioninfo and idscard the rest of the directive
> +  LexUnexpandedToken(Tok);
> +  if (!Tok.is(tok::eod)) {
> +    ReleaseAssertionInfo(AI);
> +    AI = 0;
> +    return DiscardUntilEndOfDirective();
> +  }
>
> Diagnostic here?
>
> +    while (true) {
> +      PP.LexUnexpandedToken(PeekTok);
> +      if (PeekTok.is(tok::r_paren) || PeekTok.is(tok::eod)) {
> +        break;
> +      }
> +      AI->AddToken(PeekTok);
> +    }
>
> Same request for a comment about non-nesting parens.
>
> +    // check for the closing ), if we don't have it relase the AssertionInfo
> +    // and discard the rest of the directive
> +    if (!PeekTok.is(tok::r_paren)) {
> +      PP.Diag(PredicateTok, diag::err_pp_expr_bad_token_start_expr);
> +      PP.ReleaseAssertionInfo(AI);
> +      AI = 0;
> +      return true;
> +    }
>
> "relase" typo in comment.
>
> This patch is looking really good. One non-trivial piece that's still missing is that the AssertedValues map will need to be serialized to the AST file and (lazily) deserialized, so that this feature works with precompiled headers.
>
> pp_assert_default.patch looks good to me.
>
> Thanks for working on this!
>
> 	- Doug