[cfe-dev] [PATCH] GCC Preprocessor Assertions revision 2

Argyrios Kyrtzidis kyrtzidis at apple.com
Tue Mar 6 09:21:33 PST 2012


Hi all,

I'd like to get feedback and consensus on whether we should add "GCC Preprocessor Assertions" to clang. This gcc extension has been deprecated since gcc-3, here's a quote from gcc 3.1 docs, (bold added by me):

From http://gcc.gnu.org/onlinedocs/gcc-3.1/cpp/Assertions.html

> Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.
> 
> Assertions were intended to provide a more systematic way to describe the compiler's target system. However, in practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.


Do note that, after checking on http://ideone.com, gcc still supports them on gcc-4.3.4.

Andrew did great work on implementing the feature; apart from the preprocessor changes, he added support for them in the preprocessing record and serialization/deserialization support in the ASTReader/ASTWriter.
This is a significant amount of code that we have to maintain and make sure it works well with other parts of the codebase, and I'm worried about the maintenance burden that this deprecated feature will impose, e.g.

-it reserves 1 bit out of the IdentifierInfo object. There's only 1 bit left currently so afterwards there will be non left to easily use for other purposes, without doubling the IdentifierInfo object's size.

-it increases the size of the IdentifierInfo data that are stored in the serialized AST.

Although the changes are extensive, there may be more that are needed:

-gcc docs say that you can specify assertions by command-line options; I did not see this implemented in Andrew's patches

-the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line

-It is not clear to me how this feature should interact with modules.


So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?

-Argyrios


On Dec 14, 2011, at 9:26 PM, Andrew Craik wrote:

> Hi Doug,
> 
> Please find attached a series of 4 patches which have incorporated your feedback on my previous attempt to implement support for GCC-style pre-processor assertions in Clang; the feature is now working with PCH and PCH chaining. The patches are as follows from first to last:
> 1) boost_usuallytinyptrvector_to_basic.patch - moves the UsuallyPtrVector type to Basic, per your previous, and adds an erase method to it
> 2) pp_assert_impl.patch - reworked version of the original patch to implement handling of asserts in the pre-processor
> 3) pp_assert_defaults.patch - same as previous post adding the default asserts for systems and cpus
> 4) pp_assert_serialization.patch - adds serialization support for assertions and all the required callbacks etc
> 
> For the serialization I had to add a bit to the IdentifierInfo to track if the identifier had asserted values to facilitate separate deserialization of asserts and macros. This resulted in the size of the emitted IdentifierInfo exceeding the 2 bytes that were allocated for it so I have increased the size as you can see in the patch.
> 
> I look forward to hearing from you.
> 
> Andrew
> 
> On 04-Dec-11 08:33, Andrew Craik wrote:
>> Hi Doug,
>> 
>> Thanks for the detailed feedback on the code style and control flow; all
>> your comments seem very reasonable and I will go and rework the patch. I
>> will see if I can add another patch to do the
>> serialization/de-serialization.
>> 
>> Andrew
>> 
>> On 04-Dec-11 06:32, Douglas Gregor wrote:
>>> On Dec 1, 2011, at 11:46 PM, Andrew Craik wrote:
>>> 
>>>> Hello,
>>>> 
>>>> At the LLVM Developers' Meeting a couple of weeks ago I mentioned that I had a patch to add support to the pre-processor to handle GCC-style pre-processor assertions.
>>>> Please find attached two patches:
>>>> pp_assert_impl.patch - this adds handling of assertions to the pre-processor
>>>> pp_assert_default.patch - this adds default cpu and system assertions to Targets.cpp
>>>> 
>>>> I look forward to hearing from the community and hopefully getting these patches integrated.
>>> Thanks. A bunch of comments below, but generally things are look very good!
>>> 
>>> Starting with pp_assert_impl.patch:
>>> 
>>> Index: clang/include/clang/Lex/Preprocessor.h
>>> ===================================================================
>>> --- clang.orig/include/clang/Lex/Preprocessor.h	2011-11-30 11:39:19.000000000 +1000
>>> +++ clang/include/clang/Lex/Preprocessor.h	2011-12-02 12:23:53.246940900 +1000
>>> @@ -15,6 +15,7 @@
>>>   #define LLVM_CLANG_LEX_PREPROCESSOR_H
>>> 
>>>   #include "clang/Lex/MacroInfo.h"
>>> +#include "clang/Lex/AssertionInfo.h"
>>>   #include "clang/Lex/Lexer.h"
>>>   #include "clang/Lex/PTHLexer.h"
>>>   #include "clang/Lex/PPCallbacks.h"
>>> @@ -240,6 +241,10 @@
>>>     /// to the actual definition of the macro.
>>>     llvm::DenseMap<IdentifierInfo*, MacroInfo*>   Macros;
>>> 
>>> +  /// Assertions - For each IdentifierInfo that has appeared as an asserted
>>> +  /// predicate, we store the values for which it was asserted
>>> +  llvm::DenseMap<IdentifierInfo*, std::vector<AssertionInfo*>   >   AssertedValues;
>>> 
>>> I'm guessing that the common case is that there is only one asserted value for each assertion, so I suggest replacing the std::vector with a UsuallyTinyPtrVector (which will need to be boosted from AST to Basic).
>>> 
>>> @@ -440,6 +459,30 @@
>>>     macro_iterator macro_begin(bool IncludeExternalMacros = true) const;
>>>     macro_iterator macro_end(bool IncludeExternalMacros = true) const;
>>> 
>>> +  /// AddAssertionValue - assert the specified identifier to the
>>> +  /// set of asserted values for the given assertion key
>>> +  void AddAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>> +
>>> +  /// IsAsserted - test if the specified identifier has been asserted as a
>>> +  /// value of the given assertion key returns true iff it has
>>> +  bool IsAsserted(IdentifierInfo *II, AssertionInfo *AI);
>>> +
>>> +  /// RemoveAssertionValue - remove the specified identifier from the set of
>>> +  /// asserted values for the given assertion key
>>> +  void RemoveAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>> +
>>> +  /// RemoveAssertion - remove the specified assertion and all of its asserted
>>> +  /// values
>>> +  void RemoveAssertion(IdentifierInfo *II);
>>> 
>>> A quick request hereā€¦ the LLVM coding standards clarified the naming of functions a few months ago (http://llvm.org/docs/CodingStandards.html#ll_naming), so please name these addAssertionValue, isAsserted, and so on.
>>> The same comment applies throughout the patch, because we want new code to follow this convention and existing code to eventually migrate over.
>>> 
>>> 
>>> +    void Destory() {
>>> +      ValueTokens.clear();
>>> +      this->~AssertionInfo();
>>> +    }
>>> 
>>> Typo "Destory", propagated throughout.
>>> 
>>> +/// ReadAssertionPredicate - Lex and validate an assertion predicate, which
>>> +/// occurs after an #assert or #unassert. This sets the token kind to eod and
>>> +/// discards the rest of the line if the assert is invalid.
>>> +/// isAssertUnassert is 1 if this is due to an #assert, 2 if #unassert
>>> +/// directive, 0 if it is something else
>>> +void Preprocessor::ReadAssertionPredicate(Token&AssertionPredTok,
>>> +    char isAssertUnassert) {
>>> 
>>> Please use an enumeration type rather than 0/1/2 here.
>>> 
>>> Also, why not indicate failure by returning 'true' and success by returning 'false', as we do elsewhere in Clang?
>>> 
>>> +  IdentifierInfo *II = AssertionPredTok.getIdentifierInfo();
>>> +  if (II == 0) {
>>> +    bool Invalid = false;
>>> +    std::string Spelling = getSpelling(AssertionPredTok,&Invalid);
>>> +    if (Invalid) {
>>> +      return;
>>> +    }
>>> +
>>> +    const IdentifierInfo&Info = Identifiers.get(Spelling);
>>> +    if (Info.isCPlusPlusOperatorKeyword()) {
>>> +      Diag(AssertionPredTok, diag::err_pp_operator_used_as_assert_predicate)
>>> +<<   Spelling;
>>> +    } else {
>>> +      Diag(AssertionPredTok, diag::err_pp_predicate_not_identifier);
>>> +    }
>>> +    // Fall through on error.
>>> +  } else if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert) {
>>> +    // Error if using assert as a predicate since using it would require
>>> +    // #assert
>>> +    Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>> +  } else {
>>> +    // Okay, we got a good identifier node. Return it.
>>> +    return;
>>> +  }
>>> 
>>> If you did switch the return value to 'bool', this code could be de-nested and clarified a bit:
>>> 
>>> if (IdentifierInfo *II = AssertionPredTok.getIdentifierInfo()) {
>>>    if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert)
>>>      return Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>> 
>>>    return false;
>>> }
>>> 
>>> // diagnose non-identifiers.
>>> 
>>> +//===----------------------------------------------------------------------===//
>>> +// Preprocessor Assertion Directive Handling.
>>> +//===----------------------------------------------------------------------===//
>>> +void Preprocessor::HandleAssertDirective(Token&DTok) {
>>> +  Token AssertionPredicateTok;
>>> +  ReadAssertionPredicate(AssertionPredicateTok, 1);
>>> +
>>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>> +    return;
>>> +  }
>>> +
>>> +  AssertionInfo *AI =
>>> +    AllocateAssertionInfo(AssertionPredicateTok.getLocation());
>>> +  Token Tok;
>>> +  LexUnexpandedToken(Tok);
>>> +  ReadAssertionValue(Tok, AI);
>>> +
>>> +  // add to AI
>>> +  if (AI != 0) {
>>> 
>>> I find this interface to ReadAssertionValue to be a bit odd. How about having ReadAssertionValue lex all of the tokens it needs to, then return an allocated AssertionInfo* on success or null to indicate failure? It would simplify control flow for its callers.
>>> 
>>> +void Preprocessor::HandleUnassertDirective(Token&DTok) {
>>> +  Token AssertionPredicateTok;
>>> +  ReadAssertionPredicate(AssertionPredicateTok, 2);
>>> +
>>> +  // check to see if we read eod which indicates an error has occurred and
>>> +  // an appropriate diagnostic has already been generated
>>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>> +    return;
>>> +  }
>>> +
>>> +  Token Tok;
>>> +  LexUnexpandedToken(Tok);
>>> +
>>> +  IdentifierInfo *II = AssertionPredicateTok.getIdentifierInfo();
>>> +
>>> +  // we have a valid identifier so we need to create an AssertionInfo to hold
>>> +  // the value to associate with the predicate
>>> +  assertion_iterator iter = AssertedValues.find(II);
>>> +
>>> +  // we don't have an entry in the assertion table for the predicate, this is a
>>> +  // noop so escape early
>>> +  if (iter == assertion_end()) {
>>> +    return;
>>> +  }
>>> 
>>> Shouldn't we still parse the rest of the #unassert to make sure it's well-formed? Or, at the very least, skip to the end of the directive?
>>> 
>>> +void Preprocessor::ReadAssertionValue(Token&Tok, AssertionInfo *AI) {
>>> +  if (Tok.isNot(tok::l_paren)) {
>>> +    // emit diag
>>> +    ReleaseAssertionInfo(AI);
>>> +    AI = 0;
>>> +    return DiscardUntilEndOfDirective();
>>> +  }
>>> 
>>> You're missing the diagnostic here.
>>> 
>>> +  while (true) {
>>> +    LexUnexpandedToken(Tok);
>>> +    if (Tok.is(tok::r_paren) || Tok.is(tok::eod)) {
>>> +      break;
>>> +    }
>>> +    AI->AddToken(Tok);
>>> +  }
>>> 
>>> Could you add a comment saying that it's intentional that parentheses don't nest?It surprised me, but I see that's the documented behavior, and I don't want someone coming through later and "fixing" it.
>>> 
>>> +  // check for the closing ), if we don't have it relase the AssetionInfo and
>>> +  // discard the rest of the directive
>>> +  if (!Tok.is(tok::r_paren)) {
>>> +    ReleaseAssertionInfo(AI);
>>> +    AI = 0;
>>> +    return DiscardUntilEndOfDirective();
>>> +  }
>>> 
>>> Comment typo "relase". Also, shouldn't we emit a diagnostic here?
>>> 
>>> +  // check there is nothing after the closing ), if we have anything release
>>> +  // the Assertioninfo and idscard the rest of the directive
>>> +  LexUnexpandedToken(Tok);
>>> +  if (!Tok.is(tok::eod)) {
>>> +    ReleaseAssertionInfo(AI);
>>> +    AI = 0;
>>> +    return DiscardUntilEndOfDirective();
>>> +  }
>>> 
>>> Diagnostic here?
>>> 
>>> +    while (true) {
>>> +      PP.LexUnexpandedToken(PeekTok);
>>> +      if (PeekTok.is(tok::r_paren) || PeekTok.is(tok::eod)) {
>>> +        break;
>>> +      }
>>> +      AI->AddToken(PeekTok);
>>> +    }
>>> 
>>> Same request for a comment about non-nesting parens.
>>> 
>>> +    // check for the closing ), if we don't have it relase the AssertionInfo
>>> +    // and discard the rest of the directive
>>> +    if (!PeekTok.is(tok::r_paren)) {
>>> +      PP.Diag(PredicateTok, diag::err_pp_expr_bad_token_start_expr);
>>> +      PP.ReleaseAssertionInfo(AI);
>>> +      AI = 0;
>>> +      return true;
>>> +    }
>>> 
>>> "relase" typo in comment.
>>> 
>>> This patch is looking really good. One non-trivial piece that's still missing is that the AssertedValues map will need to be serialized to the AST file and (lazily) deserialized, so that this feature works with precompiled headers.
>>> 
>>> pp_assert_default.patch looks good to me.
>>> 
>>> Thanks for working on this!
>>> 
>>> 	- Doug
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> <pp_assert_defaults.patch><pp_assert_impl.patch><pp_assert_serialization.patch><boost_usuallytinyptrvector_to_basic.patch>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120306/761cb145/attachment.html>


More information about the cfe-dev mailing list