[PATCH] D86796: [Sema] Address-space sensitive index check for unbounded arrays

Wed Sep 9 10:23:48 PDT 2020

aaron.ballman added a comment.

In D86796#2263557 <https://reviews.llvm.org/D86796#2263557>, @chrish_ericsson_atx wrote:

> Ping?

Sorry for the delayed review, but this managed to fall off my radar. Thank you for the ping!

================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8841-8844
+def warn_array_index_exceeds_max_addressable_bounds : Warning<
+  "array index %0 refers past the last possible element for an array in %1-bit "
+  "address space containing %2-bit (%3-byte) elements (max possible %4 element%s5)">,
+  InGroup<ArrayBounds>;
----------------
I'd combine this with the above diagnostic given how similar they are:
```
def warn_exceeds_max_addressable_bounds : Warning<
  "%select{array index %1|the pointer incremented by %1}0 refers past the last possible element for an array in %2-bit "
  "address space containing %3-bit (%4-byte) elements (max possible %5 element%s6)">,
  InGroup<ArrayBounds>;
```

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14043
+      ArrayTy == nullptr ? nullptr : ArrayTy->getElementType().getTypePtr();
+  bool isUnboundedArray = (BaseType == nullptr);
+  if (EffectiveType->isDependentType() ||
----------------
`IsUnboundedArray` per our usual coding style rules.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14063
+  if (isUnboundedArray) {
+    if (index.isUnsigned() || !index.isNegative()) {
+      const auto &ASTC = getASTContext();
----------------
ebevhan wrote:
> This could be early return to avoid the indentation.
+1 to using early return here. I might even get rid of `isUnboundedArray` and just use `!BaseType`

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14071
+      CharUnits ElemBytes = ASTC.getTypeSizeInChars(EffectiveType);
+      llvm::APInt apElemBytes(index.getBitWidth(), ElemBytes.getQuantity());
+      // If index has more active bits than address space, we already know
----------------
`ElemBytes` per usual coding style rules.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14077
+      if (index.getActiveBits() <= AddrBits) {
+        bool overflow;
+        llvm::APInt product(index);
----------------
`Overflow`, `Product` (I'll stop commenting on them -- just take a quick pass to make sure the new variables you introduce meet style rules.)

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14085
+
+      // Need to compute max possible elements in address space, since  that
+      // is included in diag message.
----------------
since  that -> since that

(removes a whitespace)

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14094-14096
+      unsigned DiagID = diag::warn_ptr_arith_exceeds_max_addressable_bounds;
+      if (ASE)
+        DiagID = diag::warn_array_index_exceeds_max_addressable_bounds;
----------------
```
unsigned DiagID = ASE ?
  diag::warn_array_index_exceeds_max_addressable_bounds :
  diag::warn_ptr_arith_exceeds_max_addressable_bounds;
```

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14100
+      // dependent CharUnits)
+      DiagRuntimeBehavior(BaseExpr->getBeginLoc(), BaseExpr,
+                          PDiag(DiagID)
----------------
It's not clear to me whether we need to pay the extra expense of doing reachability checks for the diagnostic -- do you have evidence that there would be a high false positive rate if we always emit the diagnostic?

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14111
+        // Try harder to find a NamedDecl to point at in the note.
+        while (const ArraySubscriptExpr *ASE =
+                   dyn_cast<ArraySubscriptExpr>(BaseExpr))
----------------
You can use `const auto *` here since the type is spelled out in the initialization. Same below.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:14121
+      if (ND)
+        DiagRuntimeBehavior(ND->getBeginLoc(), BaseExpr,
+                            PDiag(diag::note_array_declared_here) << ND);
----------------
Similar comment here about reachability.

================
Comment at: clang/test/Sema/const-eval.c:143
 // expressions, like we do for integers.
-void *PR28739a = (__int128)(unsigned long)-1 + &PR28739a;
-void *PR28739b = &PR28739b + (__int128)(unsigned long)-1;
-__int128 PR28739c = (&PR28739c + (__int128)(unsigned long)-1) - &PR28739c;
-void *PR28739d = &(&PR28739d)[(__int128)(unsigned long)-1];
+void *PR28739a = (__int128)(unsigned long)-1 + &PR28739a;                  // expected-warning {{refers past the last possible element}}
+void *PR28739b = &PR28739b + (__int128)(unsigned long)-1;                  // expected-warning {{refers past the last possible element}}
----------------
You should spell out the full form of the diagnostic the first time it appears in a test. You can use the shortened form for subsequent diagnostics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86796/new/

https://reviews.llvm.org/D86796