[cfe-dev] [PATCH] GCC Preprocessor Assertions revision 2

Thu Mar 8 10:46:25 PST 2012

On Tue, Mar 6, 2012 at 11:31 AM, Richard Smith <richard at metafoo.co.uk> wrote:
> >From a cursory investigation of this feature, it looks like it should be
> straightforward to write a tiny awk script to convert any users of it, which
> don't use #unassert, to macros:
>
> #assert a(b) -> #define a 1   #define a_b 1
> #a (inside pp condition) -> defined(a)
> #a(b) (inside pp condition) -> defined(a_b)
>
> Given that, and your analysis of the costs, I think we would need quite a
> compelling reason to accept this extension. Unless someone can present a
> solid argument in favor of this extension, it does not seem valuable to me.

+1

 - Daniel

>
> On Tue, Mar 6, 2012 at 9:21 AM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
> wrote:
>>
>> Hi all,
>>
>> I'd like to get feedback and consensus on whether we should add "GCC
>> Preprocessor Assertions" to clang. This gcc extension has been deprecated
>> since gcc-3, here's a quote from gcc 3.1 docs, (bold added by me):
>>
>> From http://gcc.gnu.org/onlinedocs/gcc-3.1/cpp/Assertions.html
>>
>> Assertions are a deprecated alternative to macros in writing conditionals
>> to test what sort of computer or system the compiled program will run on.
>> Assertions are usually predefined, but you can define them with
>> preprocessing directives or command-line options.
>>
>> Assertions were intended to provide a more systematic way to describe the
>> compiler's target system. However, in practice they are just as
>> unpredictable as the system-specific predefined macros. In addition, they
>> are not part of any standard, and only a few compilers support them.
>> Therefore, the use of assertions is less portable than the use of
>> system-specific predefined macros. We recommend you do not use them at all.
>>
>>
>> Do note that, after checking on http://ideone.com, gcc still supports them
>> on gcc-4.3.4.
>>
>> Andrew did great work on implementing the feature; apart from the
>> preprocessor changes, he added support for them in the preprocessing record
>> and serialization/deserialization support in the ASTReader/ASTWriter.
>> This is a significant amount of code that we have to maintain and make
>> sure it works well with other parts of the codebase, and I'm worried about
>> the maintenance burden that this deprecated feature will impose, e.g.
>>
>> -it reserves 1 bit out of the IdentifierInfo object. There's only 1 bit
>> left currently so afterwards there will be non left to easily use for other
>> purposes, without doubling the IdentifierInfo object's size.
>>
>> -it increases the size of the IdentifierInfo data that are stored in the
>> serialized AST.
>>
>> Although the changes are extensive, there may be more that are needed:
>>
>> -gcc docs say that you can specify assertions by command-line options; I
>> did not see this implemented in Andrew's patches
>>
>> -the feature was not integrated into our PCH validation, to be more
>> specific, validating and giving errors when assertions from PCH clash with
>> predefined assertions or assertions from the command-line
>>
>> -It is not clear to me how this feature should interact with modules.
>>
>>
>> So, to recap, the question is this, should we implement and maintain a gcc
>> extension that was deprecated since gcc 3 ?
>>
>> -Argyrios
>>
>>
>> On Dec 14, 2011, at 9:26 PM, Andrew Craik wrote:
>>
>> Hi Doug,
>>
>> Please find attached a series of 4 patches which have incorporated your
>> feedback on my previous attempt to implement support for GCC-style
>> pre-processor assertions in Clang; the feature is now working with PCH and
>> PCH chaining. The patches are as follows from first to last:
>> 1) boost_usuallytinyptrvector_to_basic.patch - moves the UsuallyPtrVector
>> type to Basic, per your previous, and adds an erase method to it
>> 2) pp_assert_impl.patch - reworked version of the original patch to
>> implement handling of asserts in the pre-processor
>> 3) pp_assert_defaults.patch - same as previous post adding the default
>> asserts for systems and cpus
>> 4) pp_assert_serialization.patch - adds serialization support for
>> assertions and all the required callbacks etc
>>
>> For the serialization I had to add a bit to the IdentifierInfo to track if
>> the identifier had asserted values to facilitate separate deserialization of
>> asserts and macros. This resulted in the size of the emitted IdentifierInfo
>> exceeding the 2 bytes that were allocated for it so I have increased the
>> size as you can see in the patch.
>>
>> I look forward to hearing from you.
>>
>> Andrew
>>
>> On 04-Dec-11 08:33, Andrew Craik wrote:
>>
>> Hi Doug,
>>
>>
>> Thanks for the detailed feedback on the code style and control flow; all
>>
>> your comments seem very reasonable and I will go and rework the patch. I
>>
>> will see if I can add another patch to do the
>>
>> serialization/de-serialization.
>>
>>
>> Andrew
>>
>>
>> On 04-Dec-11 06:32, Douglas Gregor wrote:
>>
>> On Dec 1, 2011, at 11:46 PM, Andrew Craik wrote:
>>
>>
>> Hello,
>>
>>
>> At the LLVM Developers' Meeting a couple of weeks ago I mentioned that I
>> had a patch to add support to the pre-processor to handle GCC-style
>> pre-processor assertions.
>>
>> Please find attached two patches:
>>
>> pp_assert_impl.patch - this adds handling of assertions to the
>> pre-processor
>>
>> pp_assert_default.patch - this adds default cpu and system assertions to
>> Targets.cpp
>>
>>
>> I look forward to hearing from the community and hopefully getting these
>> patches integrated.
>>
>> Thanks. A bunch of comments below, but generally things are look very
>> good!
>>
>>
>> Starting with pp_assert_impl.patch:
>>
>>
>> Index: clang/include/clang/Lex/Preprocessor.h
>>
>> ===================================================================
>>
>> --- clang.orig/include/clang/Lex/Preprocessor.h 2011-11-30
>> 11:39:19.000000000 +1000
>>
>> +++ clang/include/clang/Lex/Preprocessor.h 2011-12-02 12:23:53.246940900
>> +1000
>>
>> @@ -15,6 +15,7 @@
>>
>>   #define LLVM_CLANG_LEX_PREPROCESSOR_H
>>
>>
>>   #include "clang/Lex/MacroInfo.h"
>>
>> +#include "clang/Lex/AssertionInfo.h"
>>
>>   #include "clang/Lex/Lexer.h"
>>
>>   #include "clang/Lex/PTHLexer.h"
>>
>>   #include "clang/Lex/PPCallbacks.h"
>>
>> @@ -240,6 +241,10 @@
>>
>>     /// to the actual definition of the macro.
>>
>>     llvm::DenseMap<IdentifierInfo*, MacroInfo*>   Macros;
>>
>>
>> +  /// Assertions - For each IdentifierInfo that has appeared as an
>> asserted
>>
>> +  /// predicate, we store the values for which it was asserted
>>
>> +  llvm::DenseMap<IdentifierInfo*, std::vector<AssertionInfo*>   >
>>   AssertedValues;
>>
>>
>> I'm guessing that the common case is that there is only one asserted value
>> for each assertion, so I suggest replacing the std::vector with a
>> UsuallyTinyPtrVector (which will need to be boosted from AST to Basic).
>>
>>
>> @@ -440,6 +459,30 @@
>>
>>     macro_iterator macro_begin(bool IncludeExternalMacros = true) const;
>>
>>     macro_iterator macro_end(bool IncludeExternalMacros = true) const;
>>
>>
>> +  /// AddAssertionValue - assert the specified identifier to the
>>
>> +  /// set of asserted values for the given assertion key
>>
>> +  void AddAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>
>> +
>>
>> +  /// IsAsserted - test if the specified identifier has been asserted as
>> a
>>
>> +  /// value of the given assertion key returns true iff it has
>>
>> +  bool IsAsserted(IdentifierInfo *II, AssertionInfo *AI);
>>
>> +
>>
>> +  /// RemoveAssertionValue - remove the specified identifier from the set
>> of
>>
>> +  /// asserted values for the given assertion key
>>
>> +  void RemoveAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>
>> +
>>
>> +  /// RemoveAssertion - remove the specified assertion and all of its
>> asserted
>>
>> +  /// values
>>
>> +  void RemoveAssertion(IdentifierInfo *II);
>>
>>
>> A quick request here… the LLVM coding standards clarified the naming of
>> functions a few months ago
>> (http://llvm.org/docs/CodingStandards.html#ll_naming), so please name these
>> addAssertionValue, isAsserted, and so on.
>>
>> The same comment applies throughout the patch, because we want new code to
>> follow this convention and existing code to eventually migrate over.
>>
>>
>>
>> +    void Destory() {
>>
>> +      ValueTokens.clear();
>>
>> +      this->~AssertionInfo();
>>
>> +    }
>>
>>
>> Typo "Destory", propagated throughout.
>>
>>
>> +/// ReadAssertionPredicate - Lex and validate an assertion predicate,
>> which
>>
>> +/// occurs after an #assert or #unassert. This sets the token kind to eod
>> and
>>
>> +/// discards the rest of the line if the assert is invalid.
>>
>> +/// isAssertUnassert is 1 if this is due to an #assert, 2 if #unassert
>>
>> +/// directive, 0 if it is something else
>>
>> +void Preprocessor::ReadAssertionPredicate(Token&AssertionPredTok,
>>
>> +    char isAssertUnassert) {
>>
>>
>> Please use an enumeration type rather than 0/1/2 here.
>>
>>
>> Also, why not indicate failure by returning 'true' and success by
>> returning 'false', as we do elsewhere in Clang?
>>
>>
>> +  IdentifierInfo *II = AssertionPredTok.getIdentifierInfo();
>>
>> +  if (II == 0) {
>>
>> +    bool Invalid = false;
>>
>> +    std::string Spelling = getSpelling(AssertionPredTok,&Invalid);
>>
>> +    if (Invalid) {
>>
>> +      return;
>>
>> +    }
>>
>> +
>>
>> +    const IdentifierInfo&Info = Identifiers.get(Spelling);
>>
>> +    if (Info.isCPlusPlusOperatorKeyword()) {
>>
>> +      Diag(AssertionPredTok,
>> diag::err_pp_operator_used_as_assert_predicate)
>>
>> +<<   Spelling;
>>
>> +    } else {
>>
>> +      Diag(AssertionPredTok, diag::err_pp_predicate_not_identifier);
>>
>> +    }
>>
>> +    // Fall through on error.
>>
>> +  } else if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert)
>> {
>>
>> +    // Error if using assert as a predicate since using it would require
>>
>> +    // #assert
>>
>> +    Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>
>> +  } else {
>>
>> +    // Okay, we got a good identifier node. Return it.
>>
>> +    return;
>>
>> +  }
>>
>>
>> If you did switch the return value to 'bool', this code could be de-nested
>> and clarified a bit:
>>
>>
>> if (IdentifierInfo *II = AssertionPredTok.getIdentifierInfo()) {
>>
>>    if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert)
>>
>>      return Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>
>>
>>    return false;
>>
>> }
>>
>>
>> // diagnose non-identifiers.
>>
>>
>>
>> +//===----------------------------------------------------------------------===//
>>
>> +// Preprocessor Assertion Directive Handling.
>>
>>
>> +//===----------------------------------------------------------------------===//
>>
>> +void Preprocessor::HandleAssertDirective(Token&DTok) {
>>
>> +  Token AssertionPredicateTok;
>>
>> +  ReadAssertionPredicate(AssertionPredicateTok, 1);
>>
>> +
>>
>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>
>> +    return;
>>
>> +  }
>>
>> +
>>
>> +  AssertionInfo *AI =
>>
>> +    AllocateAssertionInfo(AssertionPredicateTok.getLocation());
>>
>> +  Token Tok;
>>
>> +  LexUnexpandedToken(Tok);
>>
>> +  ReadAssertionValue(Tok, AI);
>>
>> +
>>
>> +  // add to AI
>>
>> +  if (AI != 0) {
>>
>>
>> I find this interface to ReadAssertionValue to be a bit odd. How about
>> having ReadAssertionValue lex all of the tokens it needs to, then return an
>> allocated AssertionInfo* on success or null to indicate failure? It would
>> simplify control flow for its callers.
>>
>>
>> +void Preprocessor::HandleUnassertDirective(Token&DTok) {
>>
>> +  Token AssertionPredicateTok;
>>
>> +  ReadAssertionPredicate(AssertionPredicateTok, 2);
>>
>> +
>>
>> +  // check to see if we read eod which indicates an error has occurred
>> and
>>
>> +  // an appropriate diagnostic has already been generated
>>
>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>
>> +    return;
>>
>> +  }
>>
>> +
>>
>> +  Token Tok;
>>
>> +  LexUnexpandedToken(Tok);
>>
>> +
>>
>> +  IdentifierInfo *II = AssertionPredicateTok.getIdentifierInfo();
>>
>> +
>>
>> +  // we have a valid identifier so we need to create an AssertionInfo to
>> hold
>>
>> +  // the value to associate with the predicate
>>
>> +  assertion_iterator iter = AssertedValues.find(II);
>>
>> +
>>
>> +  // we don't have an entry in the assertion table for the predicate,
>> this is a
>>
>> +  // noop so escape early
>>
>> +  if (iter == assertion_end()) {
>>
>> +    return;
>>
>> +  }
>>
>>
>> Shouldn't we still parse the rest of the #unassert to make sure it's
>> well-formed? Or, at the very least, skip to the end of the directive?
>>
>>
>> +void Preprocessor::ReadAssertionValue(Token&Tok, AssertionInfo *AI) {
>>
>> +  if (Tok.isNot(tok::l_paren)) {
>>
>> +    // emit diag
>>
>> +    ReleaseAssertionInfo(AI);
>>
>> +    AI = 0;
>>
>> +    return DiscardUntilEndOfDirective();
>>
>> +  }
>>
>>
>> You're missing the diagnostic here.
>>
>>
>> +  while (true) {
>>
>> +    LexUnexpandedToken(Tok);
>>
>> +    if (Tok.is(tok::r_paren) || Tok.is(tok::eod)) {
>>
>> +      break;
>>
>> +    }
>>
>> +    AI->AddToken(Tok);
>>
>> +  }
>>
>>
>> Could you add a comment saying that it's intentional that parentheses
>> don't nest?It surprised me, but I see that's the documented behavior, and I
>> don't want someone coming through later and "fixing" it.
>>
>>
>> +  // check for the closing ), if we don't have it relase the AssetionInfo
>> and
>>
>> +  // discard the rest of the directive
>>
>> +  if (!Tok.is(tok::r_paren)) {
>>
>> +    ReleaseAssertionInfo(AI);
>>
>> +    AI = 0;
>>
>> +    return DiscardUntilEndOfDirective();
>>
>> +  }
>>
>>
>> Comment typo "relase". Also, shouldn't we emit a diagnostic here?
>>
>>
>> +  // check there is nothing after the closing ), if we have anything
>> release
>>
>> +  // the Assertioninfo and idscard the rest of the directive
>>
>> +  LexUnexpandedToken(Tok);
>>
>> +  if (!Tok.is(tok::eod)) {
>>
>> +    ReleaseAssertionInfo(AI);
>>
>> +    AI = 0;
>>
>> +    return DiscardUntilEndOfDirective();
>>
>> +  }
>>
>>
>> Diagnostic here?
>>
>>
>> +    while (true) {
>>
>> +      PP.LexUnexpandedToken(PeekTok);
>>
>> +      if (PeekTok.is(tok::r_paren) || PeekTok.is(tok::eod)) {
>>
>> +        break;
>>
>> +      }
>>
>> +      AI->AddToken(PeekTok);
>>
>> +    }
>>
>>
>> Same request for a comment about non-nesting parens.
>>
>>
>> +    // check for the closing ), if we don't have it relase the
>> AssertionInfo
>>
>> +    // and discard the rest of the directive
>>
>> +    if (!PeekTok.is(tok::r_paren)) {
>>
>> +      PP.Diag(PredicateTok, diag::err_pp_expr_bad_token_start_expr);
>>
>> +      PP.ReleaseAssertionInfo(AI);
>>
>> +      AI = 0;
>>
>> +      return true;
>>
>> +    }
>>
>>
>> "relase" typo in comment.
>>
>>
>> This patch is looking really good. One non-trivial piece that's still
>> missing is that the AssertedValues map will need to be serialized to the AST
>> file and (lazily) deserialized, so that this feature works with precompiled
>> headers.
>>
>>
>> pp_assert_default.patch looks good to me.
>>
>>
>> Thanks for working on this!
>>
>>
>> - Doug
>>
>> _______________________________________________
>>
>> cfe-commits mailing list
>>
>> cfe-commits at cs.uiuc.edu
>>
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>>
>> <pp_assert_defaults.patch><pp_assert_impl.patch><pp_assert_serialization.patch><boost_usuallytinyptrvector_to_basic.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>