[clang] [clang] add array out-of-bounds access constraints using llvm.assume (PR #159046)

Erich Keane via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 6 06:13:32 PDT 2025


================
@@ -4559,6 +4559,134 @@ void CodeGenFunction::EmitCountedByBoundsChecking(
   }
 }
 
+/// Emit array bounds constraints using llvm.assume for optimization hints.
+///
+/// C Standard (ISO/IEC 9899:2011 - C11)
+/// Section J.2 (Undefined behavior): An array subscript is out of range, even
+/// if an object is apparently accessible with the given subscript (as in the
+/// lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
+///
+/// Section 6.5.6 (Additive operators): If both the pointer operand and the
+/// result point to elements of the same array object, or one past the last
+/// element of the array object, the evaluation shall not produce an overflow;
+/// otherwise, the behavior is undefined.
+///
+/// C++ Standard (ISO/IEC 14882 - 2017)
+/// Section 8.7 (Additive operators):
+/// 4 When an expression that has integral type is added to or subtracted from a
+///   pointer, the result has the type of the pointer operand. If the expression
+///   P points to element x[i] of an array object x with n elements,^86 the
+///   expressions P + J and J + P (where J has the value j) point to the
+///   (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the
+///   behavior is undefined. Likewise, the expression P - J points to the
+///   (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the
+///   behavior is undefined.
+/// ^86 A pointer past the last element of an array x of n elements is
+///     considered to be equivalent to a pointer to a hypothetical element x[n]
+///     for this purpose; see 6.9.2.
+///
+/// This function emits llvm.assume statements to inform the optimizer that
+/// array subscripts are within bounds, enabling better optimization without
+/// duplicating side effects from the subscript expression. The IndexVal
+/// parameter should be the already-emitted index value to avoid re-evaluation.
+///
+/// Code that intentionally accesses out-of-bounds (UB) may break with
+/// optimizations. Only applies to constant-size arrays (not pointers, VLAs, or
+/// flexible arrays.) Disabled when -fsanitize=array-bounds is active.
+///
+void CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E,
+                                                 llvm::Value *IndexVal) {
+  // Disable with -fno-assume-array-bounds.
+  if (!CGM.getCodeGenOpts().AssumeArrayBounds)
+    return;
+
+  // Disable at -O0.
+  if (CGM.getCodeGenOpts().OptimizationLevel == 0)
+    return;
+
+  // Disable with array-bounds sanitizer.
+  if (SanOpts.has(SanitizerKind::ArrayBounds))
+    return;
+
+  const Expr *Base = E->getBase();
+  const Expr *Idx = E->getIdx();
+  QualType BaseType = Base->getType();
+
+  if (const auto *ICE = dyn_cast<ImplicitCastExpr>(Base)) {
+    if (ICE->getCastKind() == CK_ArrayToPointerDecay) {
+      BaseType = ICE->getSubExpr()->getType();
+    }
+  }
+
+  // For now: only handle constant array types.
+  const ConstantArrayType *CAT = getContext().getAsConstantArrayType(BaseType);
+  if (!CAT)
+    return;
+
+  llvm::APInt ArraySize = CAT->getSize();
+  if (ArraySize == 0)
+    return;
+
+  // Don't generate assumes for flexible array member pattern.
+  // Arrays of size 1 in structs are often used as placeholders for
+  // variable-length data (pre-C99 flexible array member idiom.)
+  if (ArraySize == 1) {
+    if (const auto *ME = dyn_cast<MemberExpr>(Base->IgnoreParenImpCasts())) {
+      if (const auto *FD = dyn_cast<FieldDecl>(ME->getMemberDecl())) {
+        const RecordDecl *RD = FD->getParent();
+        // Check if this field is the last field in the record.
+        // Only the last field can be a flexible array member.
+        const FieldDecl *LastField = nullptr;
+        for (const auto *Field : RD->fields())
+          LastField = Field;
+        if (LastField == FD)
+          // This is a size-1 array as the last field in a struct.
+          // Likely a flexible array member pattern - skip assumes.
+          return;
+      }
+    }
+  }
+
+  QualType IdxType = Idx->getType();
+  llvm::Type *IndexType = ConvertType(IdxType);
+  llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
+
+  uint64_t ArraySizeValue = ArraySize.getLimitedValue();
+  llvm::Value *ArraySizeVal = llvm::ConstantInt::get(IndexType, ArraySizeValue);
+
+  // Use the provided IndexVal to avoid duplicating side effects.
+  // The caller has already emitted the index expression once.
+  if (!IndexVal)
+    return;
+
+  // Ensure index value has the same type as our constants.
+  if (IndexVal->getType() != IndexType) {
+    bool IsSigned = IdxType->isSignedIntegerOrEnumerationType();
+    IndexVal = Builder.CreateIntCast(IndexVal, IndexType, IsSigned, "idx.cast");
+  }
+
+  // Create bounds constraint: 0 <= index && index < size.
+  // C arrays are 0-based, so valid indices are [0, size-1].
+  // This enforces the C18 standard requirement that array subscripts
+  // must be "greater than or equal to zero and less than the size of the
+  // array."
+  if (IdxType->isSignedIntegerOrEnumerationType()) {
----------------
erichkeane wrote:

SO I see that this doesn't allow 1-past-the-end.  I think that is problematic here.  The pattern `&array[size]` is absolutely necessary for about 99% of C++ STL stuff to work, and the standards guarantee the ability to take this address.

So strictly indexing can't be illegal, it has to be accessing the value.

https://github.com/llvm/llvm-project/pull/159046


More information about the cfe-commits mailing list