[PATCH] D49511: [Sema/Attribute] Check for noderef attribute

Wed Oct 3 16:52:30 PDT 2018

rsmith added inline comments.

================
Comment at: include/clang/Basic/AttrDocs.td:3513
+  int x = 2;
+  int &y = x;  // warning: 'noderef' can only be used on an array or pointer type
+
----------------
Should `noderef` appear somewhere in this example? :)

================
Comment at: include/clang/Basic/DiagnosticSemaKinds.td:9452-9453
+  "dereferencing expression marked as 'noderef'">, InGroup<NoDeref>;
+def warn_noderef_on_non_pointer_or_array : Warning<
+  "'noderef' can only be used on an array or pointer type">, InGroup<NoDeref>;
 } // end of sema component.
----------------
Putting this under `-Wonderef` doesn't really seem appropriate to me -- you're using `-Wnoderef` above to mean "warn about cases where a `noderef` pointer is dereferenced", whereas this warning is about invalid use of an attribute. I think `-Wignored-attributes` (`InGroup<IgnoredAttributes>`) would be a better choice.

================
Comment at: lib/Parse/ParseExpr.cpp:1126
+
+    Actions.StartCheckingNoDeref();
+
----------------
leonardchan wrote:
> rsmith wrote:
> > leonardchan wrote:
> > > rsmith wrote:
> > > > This parser-driven start/stop mechanism will not work in C++ templates. Instead, can you remove the "start" part and check the noderef exprs as part of popping the ExprEvaluationContextRecord?
> > > I'm not sure if I should remove the the start and stop methods because for a regular C program, the Push/PopExprEvaluationContextRecord functions don't seem to be called, and even when they are called in C++, the initial record that exists on the stack isn't popped at all.
> > > 
> > > Since pending noderef expressions are still parsed and added to the last record during template instantiation, doing another check when popping covers all noderef exprs added during instantiation.
> > `PushExpressionEvaluationContext` / `PopExpressionEvaluationContext` are called regardless of which language we're parsing. If we're missing `ExpressionEvaluationContext` records around some expression parsing, we should fix that. We should never be creating expressions within the initial `ExpressionEvaluationContext` record (except perhaps during error recovery).
> > 
> > > Since pending noderef expressions are still parsed and added to the last record during template instantiation, doing another check when popping covers all noderef exprs added during instantiation.
> > 
> > That's not how template instantiation works in Clang. We don't re-parse, we perform a recursive tree transformation that does not involve the parser.
> So when should a new `ExpressionEvaluationContext` be pushed or popped? 
> 
> For the following code:
> 
> ```
> #define NODEREF __attribute__((noderef))
> 
> void func(){
>   int NODEREF *x;
>   *x;
> }
> 
> int main(){
>   func();
> }
> ```
> 
> A new context is pushed then popped in C++ but not for C. From what I can tell based off my observations and looking for where `Push/Pop` get called in code, new ones would get added when we enter a new GNU style statement expression, captured region after a pragma, or different error blocks.
Hmm, it looks like we don't push/pop expression evaluation context records for function definitions or variable initializers currently (instead we have a single global "potentially evaluated" context which wraps all such contexts and appears to never be popped).

The fact that we share a single record for all functions / variables is likely simply because it never previously mattered that we were merging notionally-distinct contexts in this way. I suspect actually fixing that will uncover more problems (most likely, pre-existing bugs, but barriers to your progress nonetheless).

================
Comment at: lib/Parse/ParseStmt.cpp:102-104
+  Actions.PushExpressionEvaluationContext(
+      Actions.ExprEvalContexts.back().Context);
   ParenBraceBracketBalancer BalancerRAIIObj(*this);
----------------
Hmm, we call `ParseStatementOrDeclaration` rather a lot. Can we pull this up into its ultimate callers instead:

  * `Parser::ParseFunctionDefinition`
  * `Parser::ParseLateTemplatedFuncDef`
  * `Parser::ParseLexedMethodDef`
  * `Parser::ParseLexedObjCMethodDefs`

? Actually, we can do better than that: those functions all call either `ActOnStartOfFunctionDef` or `ActOnStartOfObjCMethodDef` first, and `ActOnFinishFunctionBody` when they're done. We should put the `PushExpressionEvaluationContext` and `PopExpressionEvaluationContext` calls in those functions instead.

We're missing at least one other place: when parsing the initializer for a global variable, in C++ we call `Sema::ActOnCXXEnterDeclInitializer`, which pushes a potentially-evaluated context. But in C, the `InitializerScopeRAII` object (in `Parser::ParseDeclaratoinAfterDeclaratorAndAttributes`) doesn't call into `Sema` and we don't ever push a suitable expression evaluation context.

You can find if any other places are missing by removing `Sema`'s global `ExpressionEvaluationContext` and adding an assert that fires if we try to parse an expression with no `ExpressionEvaluationContext`, and then running the unit tests. (And we should check in that change to ensure this doesn't regress.)

================
Comment at: lib/Sema/SemaExpr.cpp:4289
+
+  if (TypeHasNoDeref(ResultTy)) {
+    LastRecord.PossibleDerefs.insert(E);
----------------
leonardchan wrote:
> rsmith wrote:
> > Do you ensure that the `noderef` attribute will be on the innermost level of the array type? I think you'll need to do so in order to warn on:
> > 
> > ```
> > typedef int A[32];
> > typedef A __attribute__((noderef)) *B;
> > int f(B b) { return (*B)[1]; }
> > ```
> > 
> > (Here, we have a pointer to a noderef annotated array of non-noderef-annotated int. So I think we will not emit a warning from the first dereference, because we have a pointer to an array, and we will not emit a warning from the second dereference in the array indexing, because the result type does not have the noderef attribute.)
> Hmmmm, so normally in order to check what's correct, I usually run these examples through `sparse` since that's the tool that actually checks `noderef` for gcc, but it seems that sparse instead diagnoses a warning on the array indexing instead and nothing for the first dereference.
> 
> This shouldn't be though since, as you pointed out, the array does not have `noderef` types. For a simple array,
> 
> ```
> int __attribute__((noderef)) x[10];
> x[0];
> ```
> 
> `sparse` diagnoses the appropriate warning for the array index. Personally though, I would chalk this up to an edge case that wasn't thought of before in sparse, since something like this isn't handled on their existing validation tests.
> 
> Currently, this diagnoses a warning on the first dereference, but I can also understand why it shouldn't warn because accessing `noderef` structs shouldn't warn if the member accessed is an array. The only case though this applies in sparse's tests are with structs and they don't provide a test for dereferencing a pointer to an array.
> 
> I think the reason for why the struct example still passes is that if the member is an array, none of the actual data in the struct is being accessed. Instead, the value returned is just a pointer to the start of the array within the struct and equivalent to just adding an offset to the struct pointer. I think that no warning should also be thrown for this example if it is guaranteed that array `A` can similarly be found from simple pointer arithmetic on pointer `B`. I don't think it can unless B were an array or struct also, but I could be wrong or overthinking this.
Arrays-of-arrays and structs-containing-arrays are the same in this regard: in both cases, accessing the first level (array element or struct member) gives you an array lvalue, and only accessing that second array will actually dereference memory. So I think the treatment of the two cases should be consistent. For member access, you defer the check if the element type is an array, and I think you should do the same thing here.

There are two consequences of this:

1) If the array element type is an array, you should bail out of here and not add `E` to `PossibleDerefs`.
2) If the array type (not the element type) has the `noderef` attribute, you should propagate it down to the array element type so that we can catch a later pointer dereference / array indexing operation.

For point 2, what this means in my example is that the expression `*B` should have type "array of 32 noderef ints" so that further noderef checks apply to it. The easiest way to accomplish this would be to check, when applying noderef to a type, whether the type is an array type, and if so also apply noderef to the array's element type.

================
Comment at: lib/Sema/SemaExpr.cpp:4164

+bool Sema::TypeHasNoDeref(QualType Ty) { return Ty->hasAttr(attr::NoDeref); }
+
----------------
If it makes sense for this to exist at all, it should be a member of `Type` instead.

================
Comment at: lib/Sema/SemaExpr.cpp:4257-4259
+  auto FoundExpr = LastRecord.PossibleDerefs.find(StrippedExpr);
+  if (FoundExpr != LastRecord.PossibleDerefs.end())
+    LastRecord.PossibleDerefs.erase(*FoundExpr);
----------------
Replace these three lines with just

```
LastRecord.PossibleDerefs.erase(StrippedExpr);
```

================
Comment at: test/Frontend/noderef_on_non_pointers.cpp:4
+/**
+ * Test 'noderef' attribute against other pointer-like types.
+ */
----------------
I'd like to see a test that we get a wraning for attempting to bind a reference to a dereferenced `noderef` pointer.

In C++, should it be valid to assign a `noderef` pointer to a dereferenceable pointer without a cast?

Repository:
  rC Clang

https://reviews.llvm.org/D49511