[clang] [clang] add array out-of-bounds access constraints using llvm.assume (PR #159046)
Nikita Popov via cfe-commits
cfe-commits at lists.llvm.org
Wed Oct 1 08:37:43 PDT 2025
================
@@ -4559,6 +4559,97 @@ void CodeGenFunction::EmitCountedByBoundsChecking(
}
}
+/// Emit array bounds constraints using llvm.assume for optimization hints.
+///
+/// C Standard (ISO/IEC 9899:2011 - C11)
+/// Section J.2 (Undefined behavior): An array subscript is out of range, even
+/// if an object is apparently accessible with the given subscript (as in the
+/// lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
+///
+/// Section 6.5.6 (Additive operators): If both the pointer operand and the
+/// result point to elements of the same array object, or one past the last
+/// element of the array object, the evaluation shall not produce an overflow;
+/// otherwise, the behavior is undefined.
+///
+/// C++ Standard (ISO/IEC 14882 - 2017)
+/// Section 8.7 (Additive operators):
+/// 4 When an expression that has integral type is added to or subtracted from a
+/// pointer, the result has the type of the pointer operand. If the expression
+/// P points to element x[i] of an array object x with n elements,^86 the
+/// expressions P + J and J + P (where J has the value j) point to the
+/// (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the
+/// behavior is undefined. Likewise, the expression P - J points to the
+/// (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the
+/// behavior is undefined.
+/// ^86 A pointer past the last element of an array x of n elements is
+/// considered to be equivalent to a pointer to a hypothetical element x[n]
+/// for this purpose; see 6.9.2.
+///
+/// This function emits llvm.assume statements to inform the optimizer that
+/// array subscripts are within bounds, enabling better optimization without
+/// duplicating side effects from the subscript expression. The IndexVal
+/// parameter should be the already-emitted index value to avoid re-evaluation.
+void CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E,
+ llvm::Value *IndexVal) {
+ const Expr *Base = E->getBase();
+ const Expr *Idx = E->getIdx();
+ QualType BaseType = Base->getType();
+
+ if (const auto *ICE = dyn_cast<ImplicitCastExpr>(Base)) {
+ if (ICE->getCastKind() == CK_ArrayToPointerDecay) {
+ BaseType = ICE->getSubExpr()->getType();
+ }
+ }
+
+ // For now: only handle constant array types.
+ const ConstantArrayType *CAT = getContext().getAsConstantArrayType(BaseType);
+ if (!CAT)
+ return;
+
+ llvm::APInt ArraySize = CAT->getSize();
+ if (ArraySize == 0)
+ return;
+
+ QualType IdxType = Idx->getType();
+ llvm::Type *IndexType = ConvertType(IdxType);
+ llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
+
+ uint64_t ArraySizeValue = ArraySize.getLimitedValue();
+ llvm::Value *ArraySizeVal = llvm::ConstantInt::get(IndexType, ArraySizeValue);
+
+ // Use the provided IndexVal to avoid duplicating side effects.
+ // The caller has already emitted the index expression once.
+ if (!IndexVal)
+ return;
+
+ // Ensure index value has the same type as our constants.
+ if (IndexVal->getType() != IndexType) {
+ bool IsSigned = IdxType->isSignedIntegerOrEnumerationType();
+ IndexVal = Builder.CreateIntCast(IndexVal, IndexType, IsSigned, "idx.cast");
+ }
+
+ // Create bounds constraint: 0 <= index && index < size.
+ // C arrays are 0-based, so valid indices are [0, size-1].
+ // This enforces the C18 standard requirement that array subscripts
+ // must be "greater than or equal to zero and less than the size of the
+ // array."
+ llvm::Value *LowerBound, *UpperBound;
+ if (IdxType->isSignedIntegerOrEnumerationType()) {
+ // For signed indices: index >= 0 && index < size.
+ LowerBound = Builder.CreateICmpSGE(IndexVal, Zero, "idx.ge.zero");
+ UpperBound = Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.lt.size");
+ } else {
+ // For unsigned indices: index < size (>= 0 is implicit).
+ LowerBound = Builder.getTrue();
+ UpperBound = Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.lt.size");
+ }
+
+ llvm::Value *BoundsConstraint =
+ Builder.CreateAnd(LowerBound, UpperBound, "bounds.constraint");
+ llvm::Function *AssumeIntrinsic = CGM.getIntrinsic(llvm::Intrinsic::assume);
+ Builder.CreateCall(AssumeIntrinsic, BoundsConstraint);
----------------
nikic wrote:
You can use Builder.CreateAssumption here.
https://github.com/llvm/llvm-project/pull/159046
More information about the cfe-commits
mailing list