[cfe-dev] [PATCH] GCC Preprocessor Assertions revision 2

Wed Mar 7 12:18:43 PST 2012

Hi Andrew, I'm going to comment a bit out-of-order.

>> -the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line
> I do not believe there is anything that needs to be validated in terms of PCH vs command-line vs source preprocessor assertions. You can assert multiple values for a given key quite correctly (unlike macros) and assertions are designed to be added and removed using all of these features and it is not wrong to do so. Further, asserting the same key-value pair twice is not an error and so I don't think there needs to be validation of clashes since, by design, these clashes are permitted.

I had in mind checking whether there are differences in the assertions in command-line and predefined ones, (missing assertions, newly introduced), causing differences in PCH inclusion,  like we do with macros, but I noticed that gcc does not care for assertion differences in command-line (when including the PCH) and even if the predefined ones are different, e.g including a PCH with

#assert cpu(i386)

when you are on x86_64, the PCH will probably be rejected by PCH validation anyway, so we probably don't need to enhance validation to include assertions.

>> So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?
> I think clang should implement and maintain this feature because
> 1) It's widely supported by existing compilers including GCC 4.6.X (http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Preprocessor-Options.html#Preprocessor-Options), Intel C/C++ (http://denali.princeton.edu/intel_cc_docs/c_ug/index.htm), and ARM's toolchain (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491c/CHDFDIDE.html)
> 2) Preprocessor assertions are an optional extension defined in System V Release 4 (SVR4) see the System V Interface manual Vol 3 page 270 (http://www.sco.com/developers/devspecs/vol3_ps.ps) and so should be implemented to support SVR4 systems which have chosen to implement this feature.
> 3) It was clearly identified as a missing feature by FIXMEs in Chris' original commit of PPDirectives.cpp and currently the compiler produces only cryptic errors when this feature is encountered.
> 4) It is not possible to work around this feature without having to modify the source and doing so with an automated sed or awk script is risky and difficult to integrate into some build processes. I did consider trying to emulate assertions using macros, but I found this wasn't possible using only the preprocessor. Asserts and macros exist in different namespaces and so you can define and assert the same symbol. In many cases this is not a problem because all we want to do is test if a macro is defined, but if the value of the macro is important then a collision in the names would cause a problem. The assertion tests themselves would have to be found and rewritten using a sed or awk script which is a fragile hack rather than a proper fix and having to run this script before a compile would complicate the build process etc. This kind of workaround also assumes that the source can be modified prior to compilation.
> 5) Clang has publicly stated that it will aim for GCC compatibility where possible and these patches make compatibility with this feature possible.

"aim for GCC compatibility" does not mean "implement the exact behavior of gcc unconditionally".
There are gcc extensions that we don't have interest in implementing, like nested functions, and when the standard disagrees with gcc we side with the standard, see http://clang.llvm.org/compatibility.html.

It does not help the case for this feature that gcc itself is aggressively discouraging people from using it, e.g. this is from gcc 4.6.2:

$ cat t.c
#assert a(b)
#if #a(b)
#endif

$ gcc -fsyntax-only t.c
t.c:1:2: warning: #assert is a deprecated GCC extension [-Wdeprecated]
t.c:2:5: warning: assertions are a deprecated extension [-Wdeprecated]

I find the prospect that we would implement a new feature and then put it under a "deprecated" warning, at least unfortunate.

Since we in clang would discourage people from using the feature as well, we would implement it mainly to support legacy codebases or projects that, for some reason, still depend on a deprecated (and not widely used) gcc extension.
Under the this frame of discussion:

> I ran into preprocessor assertions when I was trying to build an older version of Grub which is what prompted my implementation.

Could you elaborate more, is this legacy codebase that nobody wants to touch or move to a newer version ?
-Is the feature widely used in your company's codebase or not ?
-Is it only a minority of projects that you care about that depend on it ?
-Could these just keep using gcc for building ?

-Argyrios

On Mar 6, 2012, at 9:27 PM, Andrew Craik wrote:

> Hi Argyrios,
> 
> Thank you for reviewing my patches.
> 
> I have replied to your comments inline below.
> 
> I ran into preprocessor assertions when I was trying to build an older version of Grub which is what prompted my implementation.
> 
> Kind regards,
> Andrew
> 
> On 07-Mar-12 03:21, Argyrios Kyrtzidis wrote:
>> 
>> Hi all,
>> 
>> I'd like to get feedback and consensus on whether we should add "GCC Preprocessor Assertions" to clang. This gcc extension has been deprecated since gcc-3, here's a quote from gcc 3.1 docs, (bold added by me):
>> 
>> From http://gcc.gnu.org/onlinedocs/gcc-3.1/cpp/Assertions.html
>> 
>>> Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.
>>> 
>>> Assertions were intended to provide a more systematic way to describe the compiler's target system. However, in practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.
>> 
>> 
>> Do note that, after checking on http://ideone.com, gcc still supports them on gcc-4.3.4.
>> 
>> Andrew did great work on implementing the feature; apart from the preprocessor changes, he added support for them in the preprocessing record and serialization/deserialization support in the ASTReader/ASTWriter.
>> This is a significant amount of code that we have to maintain and make sure it works well with other parts of the codebase, and I'm worried about the maintenance burden that this deprecated feature will impose, e.g.
>> 
>> -it reserves 1 bit out of the IdentifierInfo object. There's only 1 bit left currently so afterwards there will be non left to easily use for other purposes, without doubling the IdentifierInfo object's size.
> If anyone has suggestions on how to avoid this, I am happy to consider reworking my patch to incorporate such an improvement.
>> 
>> 
>> -it increases the size of the IdentifierInfo data that are stored in the serialized AST.
> I think this is unavoidable if the feature is going to be implemented, but again suggestions on how to improve things are most welcome.
>> 
>> Although the changes are extensive, there may be more that are needed:
>> 
>> -gcc docs say that you can specify assertions by command-line options; I did not see this implemented in Andrew's patches
> I am aware the command-line option (-A) to assert a key-value pair is not implemented in the patches I have submitted. I didn't write this support yet because clang uses -A for other things and I wasn't sure if that needed to be changed or if we were going to use a different flag - the actual patch will be relatively small. If the feature is accepted and guidance is provided as to what flag we should use I am happy to write and contribute this patch
>> 
>> 
>> -the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line
> I do not believe there is anything that needs to be validated in terms of PCH vs command-line vs source preprocessor assertions. You can assert multiple values for a given key quite correctly (unlike macros) and assertions are designed to be added and removed using all of these features and it is not wrong to do so. Further, asserting the same key-value pair twice is not an error and so I don't think there needs to be validation of clashes since, by design, these clashes are permitted.
>> 
>> 
>> -It is not clear to me how this feature should interact with modules.
> I am not sure how these would interact since the reference implementation, GCC, does not support modules in the versions I have tested. Doug, do you have any thoughts on this since you have worked on modules? Since the behavior is undefined we can probably do whatever is easiest and makes the most sense.
>> 
>> 
>> 
>> So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?
> I think clang should implement and maintain this feature because
> 1) It's widely supported by existing compilers including GCC 4.6.X (http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Preprocessor-Options.html#Preprocessor-Options), Intel C/C++ (http://denali.princeton.edu/intel_cc_docs/c_ug/index.htm), and ARM's toolchain (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491c/CHDFDIDE.html)
> 2) Preprocessor assertions are an optional extension defined in System V Release 4 (SVR4) see the System V Interface manual Vol 3 page 270 (http://www.sco.com/developers/devspecs/vol3_ps.ps) and so should be implemented to support SVR4 systems which have chosen to implement this feature.
> 3) It was clearly identified as a missing feature by FIXMEs in Chris' original commit of PPDirectives.cpp and currently the compiler produces only cryptic errors when this feature is encountered.
> 4) It is not possible to work around this feature without having to modify the source and doing so with an automated sed or awk script is risky and difficult to integrate into some build processes. I did consider trying to emulate assertions using macros, but I found this wasn't possible using only the preprocessor. Asserts and macros exist in different namespaces and so you can define and assert the same symbol. In many cases this is not a problem because all we want to do is test if a macro is defined, but if the value of the macro is important then a collision in the names would cause a problem. The assertion tests themselves would have to be found and rewritten using a sed or awk script which is a fragile hack rather than a proper fix and having to run this script before a compile would complicate the build process etc. This kind of workaround also assumes that the source can be modified prior to compilation.
> 5) Clang has publicly stated that it will aim for GCC compatibility where possible and these patches make compatibility with this feature possible.
>> 
>> 
>> -Argyrios
>> 
>> 
>> On Dec 14, 2011, at 9:26 PM, Andrew Craik wrote:
>> 
>>> Hi Doug,
>>> 
>>> Please find attached a series of 4 patches which have incorporated your feedback on my previous attempt to implement support for GCC-style pre-processor assertions in Clang; the feature is now working with PCH and PCH chaining. The patches are as follows from first to last:
>>> 1) boost_usuallytinyptrvector_to_basic.patch - moves the UsuallyPtrVector type to Basic, per your previous, and adds an erase method to it
>>> 2) pp_assert_impl.patch - reworked version of the original patch to implement handling of asserts in the pre-processor
>>> 3) pp_assert_defaults.patch - same as previous post adding the default asserts for systems and cpus
>>> 4) pp_assert_serialization.patch - adds serialization support for assertions and all the required callbacks etc
>>> 
>>> For the serialization I had to add a bit to the IdentifierInfo to track if the identifier had asserted values to facilitate separate deserialization of asserts and macros. This resulted in the size of the emitted IdentifierInfo exceeding the 2 bytes that were allocated for it so I have increased the size as you can see in the patch.
>>> 
>>> I look forward to hearing from you.
>>> 
>>> Andrew
>>> 
>>> On 04-Dec-11 08:33, Andrew Craik wrote:
>>>> Hi Doug,
>>>> 
>>>> Thanks for the detailed feedback on the code style and control flow; all
>>>> your comments seem very reasonable and I will go and rework the patch. I
>>>> will see if I can add another patch to do the
>>>> serialization/de-serialization.
>>>> 
>>>> Andrew
>>>> 
>>>> On 04-Dec-11 06:32, Douglas Gregor wrote:
>>>>> On Dec 1, 2011, at 11:46 PM, Andrew Craik wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> At the LLVM Developers' Meeting a couple of weeks ago I mentioned that I had a patch to add support to the pre-processor to handle GCC-style pre-processor assertions.
>>>>>> Please find attached two patches:
>>>>>> pp_assert_impl.patch - this adds handling of assertions to the pre-processor
>>>>>> pp_assert_default.patch - this adds default cpu and system assertions to Targets.cpp
>>>>>> 
>>>>>> I look forward to hearing from the community and hopefully getting these patches integrated.
>>>>> Thanks. A bunch of comments below, but generally things are look very good!
>>>>> 
>>>>> Starting with pp_assert_impl.patch:
>>>>> 
>>>>> Index: clang/include/clang/Lex/Preprocessor.h
>>>>> ===================================================================
>>>>> --- clang.orig/include/clang/Lex/Preprocessor.h 2011-11-30 11:39:19.000000000 +1000
>>>>> +++ clang/include/clang/Lex/Preprocessor.h 2011-12-02 12:23:53.246940900 +1000
>>>>> @@ -15,6 +15,7 @@
>>>>>   #define LLVM_CLANG_LEX_PREPROCESSOR_H
>>>>> 
>>>>>   #include "clang/Lex/MacroInfo.h"
>>>>> +#include "clang/Lex/AssertionInfo.h"
>>>>>   #include "clang/Lex/Lexer.h"
>>>>>   #include "clang/Lex/PTHLexer.h"
>>>>>   #include "clang/Lex/PPCallbacks.h"
>>>>> @@ -240,6 +241,10 @@
>>>>>     /// to the actual definition of the macro.
>>>>>     llvm::DenseMap<IdentifierInfo*, MacroInfo*>   Macros;
>>>>> 
>>>>> +  /// Assertions - For each IdentifierInfo that has appeared as an asserted
>>>>> +  /// predicate, we store the values for which it was asserted
>>>>> +  llvm::DenseMap<IdentifierInfo*, std::vector<AssertionInfo*>   >   AssertedValues;
>>>>> 
>>>>> I'm guessing that the common case is that there is only one asserted value for each assertion, so I suggest replacing the std::vector with a UsuallyTinyPtrVector (which will need to be boosted from AST to Basic).
>>>>> 
>>>>> @@ -440,6 +459,30 @@
>>>>>     macro_iterator macro_begin(bool IncludeExternalMacros = true) const;
>>>>>     macro_iterator macro_end(bool IncludeExternalMacros = true) const;
>>>>> 
>>>>> +  /// AddAssertionValue - assert the specified identifier to the
>>>>> +  /// set of asserted values for the given assertion key
>>>>> +  void AddAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>>>> +
>>>>> +  /// IsAsserted - test if the specified identifier has been asserted as a
>>>>> +  /// value of the given assertion key returns true iff it has
>>>>> +  bool IsAsserted(IdentifierInfo *II, AssertionInfo *AI);
>>>>> +
>>>>> +  /// RemoveAssertionValue - remove the specified identifier from the set of
>>>>> +  /// asserted values for the given assertion key
>>>>> +  void RemoveAssertionValue(IdentifierInfo *II, AssertionInfo *AI);
>>>>> +
>>>>> +  /// RemoveAssertion - remove the specified assertion and all of its asserted
>>>>> +  /// values
>>>>> +  void RemoveAssertion(IdentifierInfo *II);
>>>>> 
>>>>> A quick request here… the LLVM coding standards clarified the naming of functions a few months ago (http://llvm.org/docs/CodingStandards.html#ll_naming), so please name these addAssertionValue, isAsserted, and so on.
>>>>> The same comment applies throughout the patch, because we want new code to follow this convention and existing code to eventually migrate over.
>>>>> 
>>>>> 
>>>>> +    void Destory() {
>>>>> +      ValueTokens.clear();
>>>>> +      this->~AssertionInfo();
>>>>> +    }
>>>>> 
>>>>> Typo "Destory", propagated throughout.
>>>>> 
>>>>> +/// ReadAssertionPredicate - Lex and validate an assertion predicate, which
>>>>> +/// occurs after an #assert or #unassert. This sets the token kind to eod and
>>>>> +/// discards the rest of the line if the assert is invalid.
>>>>> +/// isAssertUnassert is 1 if this is due to an #assert, 2 if #unassert
>>>>> +/// directive, 0 if it is something else
>>>>> +void Preprocessor::ReadAssertionPredicate(Token&AssertionPredTok,
>>>>> +    char isAssertUnassert) {
>>>>> 
>>>>> Please use an enumeration type rather than 0/1/2 here.
>>>>> 
>>>>> Also, why not indicate failure by returning 'true' and success by returning 'false', as we do elsewhere in Clang?
>>>>> 
>>>>> +  IdentifierInfo *II = AssertionPredTok.getIdentifierInfo();
>>>>> +  if (II == 0) {
>>>>> +    bool Invalid = false;
>>>>> +    std::string Spelling = getSpelling(AssertionPredTok,&Invalid);
>>>>> +    if (Invalid) {
>>>>> +      return;
>>>>> +    }
>>>>> +
>>>>> +    const IdentifierInfo&Info = Identifiers.get(Spelling);
>>>>> +    if (Info.isCPlusPlusOperatorKeyword()) {
>>>>> +      Diag(AssertionPredTok, diag::err_pp_operator_used_as_assert_predicate)
>>>>> +<<   Spelling;
>>>>> +    } else {
>>>>> +      Diag(AssertionPredTok, diag::err_pp_predicate_not_identifier);
>>>>> +    }
>>>>> +    // Fall through on error.
>>>>> +  } else if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert) {
>>>>> +    // Error if using assert as a predicate since using it would require
>>>>> +    // #assert
>>>>> +    Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>>>> +  } else {
>>>>> +    // Okay, we got a good identifier node. Return it.
>>>>> +    return;
>>>>> +  }
>>>>> 
>>>>> If you did switch the return value to 'bool', this code could be de-nested and clarified a bit:
>>>>> 
>>>>> if (IdentifierInfo *II = AssertionPredTok.getIdentifierInfo()) {
>>>>>    if (isAssertUnassert&&   II->getPPKeywordID() == tok::pp_assert)
>>>>>      return Diag(AssertionPredTok, diag::err_assert_predicate_name);
>>>>> 
>>>>>    return false;
>>>>> }
>>>>> 
>>>>> // diagnose non-identifiers.
>>>>> 
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +// Preprocessor Assertion Directive Handling.
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +void Preprocessor::HandleAssertDirective(Token&DTok) {
>>>>> +  Token AssertionPredicateTok;
>>>>> +  ReadAssertionPredicate(AssertionPredicateTok, 1);
>>>>> +
>>>>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>>>> +    return;
>>>>> +  }
>>>>> +
>>>>> +  AssertionInfo *AI =
>>>>> +    AllocateAssertionInfo(AssertionPredicateTok.getLocation());
>>>>> +  Token Tok;
>>>>> +  LexUnexpandedToken(Tok);
>>>>> +  ReadAssertionValue(Tok, AI);
>>>>> +
>>>>> +  // add to AI
>>>>> +  if (AI != 0) {
>>>>> 
>>>>> I find this interface to ReadAssertionValue to be a bit odd. How about having ReadAssertionValue lex all of the tokens it needs to, then return an allocated AssertionInfo* on success or null to indicate failure? It would simplify control flow for its callers.
>>>>> 
>>>>> +void Preprocessor::HandleUnassertDirective(Token&DTok) {
>>>>> +  Token AssertionPredicateTok;
>>>>> +  ReadAssertionPredicate(AssertionPredicateTok, 2);
>>>>> +
>>>>> +  // check to see if we read eod which indicates an error has occurred and
>>>>> +  // an appropriate diagnostic has already been generated
>>>>> +  if (AssertionPredicateTok.is(tok::eod)) {
>>>>> +    return;
>>>>> +  }
>>>>> +
>>>>> +  Token Tok;
>>>>> +  LexUnexpandedToken(Tok);
>>>>> +
>>>>> +  IdentifierInfo *II = AssertionPredicateTok.getIdentifierInfo();
>>>>> +
>>>>> +  // we have a valid identifier so we need to create an AssertionInfo to hold
>>>>> +  // the value to associate with the predicate
>>>>> +  assertion_iterator iter = AssertedValues.find(II);
>>>>> +
>>>>> +  // we don't have an entry in the assertion table for the predicate, this is a
>>>>> +  // noop so escape early
>>>>> +  if (iter == assertion_end()) {
>>>>> +    return;
>>>>> +  }
>>>>> 
>>>>> Shouldn't we still parse the rest of the #unassert to make sure it's well-formed? Or, at the very least, skip to the end of the directive?
>>>>> 
>>>>> +void Preprocessor::ReadAssertionValue(Token&Tok, AssertionInfo *AI) {
>>>>> +  if (Tok.isNot(tok::l_paren)) {
>>>>> +    // emit diag
>>>>> +    ReleaseAssertionInfo(AI);
>>>>> +    AI = 0;
>>>>> +    return DiscardUntilEndOfDirective();
>>>>> +  }
>>>>> 
>>>>> You're missing the diagnostic here.
>>>>> 
>>>>> +  while (true) {
>>>>> +    LexUnexpandedToken(Tok);
>>>>> +    if (Tok.is(tok::r_paren) || Tok.is(tok::eod)) {
>>>>> +      break;
>>>>> +    }
>>>>> +    AI->AddToken(Tok);
>>>>> +  }
>>>>> 
>>>>> Could you add a comment saying that it's intentional that parentheses don't nest?It surprised me, but I see that's the documented behavior, and I don't want someone coming through later and "fixing" it.
>>>>> 
>>>>> +  // check for the closing ), if we don't have it relase the AssetionInfo and
>>>>> +  // discard the rest of the directive
>>>>> +  if (!Tok.is(tok::r_paren)) {
>>>>> +    ReleaseAssertionInfo(AI);
>>>>> +    AI = 0;
>>>>> +    return DiscardUntilEndOfDirective();
>>>>> +  }
>>>>> 
>>>>> Comment typo "relase". Also, shouldn't we emit a diagnostic here?
>>>>> 
>>>>> +  // check there is nothing after the closing ), if we have anything release
>>>>> +  // the Assertioninfo and idscard the rest of the directive
>>>>> +  LexUnexpandedToken(Tok);
>>>>> +  if (!Tok.is(tok::eod)) {
>>>>> +    ReleaseAssertionInfo(AI);
>>>>> +    AI = 0;
>>>>> +    return DiscardUntilEndOfDirective();
>>>>> +  }
>>>>> 
>>>>> Diagnostic here?
>>>>> 
>>>>> +    while (true) {
>>>>> +      PP.LexUnexpandedToken(PeekTok);
>>>>> +      if (PeekTok.is(tok::r_paren) || PeekTok.is(tok::eod)) {
>>>>> +        break;
>>>>> +      }
>>>>> +      AI->AddToken(PeekTok);
>>>>> +    }
>>>>> 
>>>>> Same request for a comment about non-nesting parens.
>>>>> 
>>>>> +    // check for the closing ), if we don't have it relase the AssertionInfo
>>>>> +    // and discard the rest of the directive
>>>>> +    if (!PeekTok.is(tok::r_paren)) {
>>>>> +      PP.Diag(PredicateTok, diag::err_pp_expr_bad_token_start_expr);
>>>>> +      PP.ReleaseAssertionInfo(AI);
>>>>> +      AI = 0;
>>>>> +      return true;
>>>>> +    }
>>>>> 
>>>>> "relase" typo in comment.
>>>>> 
>>>>> This patch is looking really good. One non-trivial piece that's still missing is that the AssertedValues map will need to be serialized to the AST file and (lazily) deserialized, so that this feature works with precompiled headers.
>>>>> 
>>>>> pp_assert_default.patch looks good to me.
>>>>> 
>>>>> Thanks for working on this!
>>>>> 
>>>>>  - Doug
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>> <pp_assert_defaults.patch><pp_assert_impl.patch><pp_assert_serialization.patch><boost_usuallytinyptrvector_to_basic.patch>_______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120307/857d6075/attachment.html>