[llvm] [LAA] Be more careful when evaluating AddRecs at symbolic max BTC. (PR #128061)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 20 12:29:13 PST 2025
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/128061
Evaluating AR at the symbolic max BTC may wrap and create an expression that is less than the start of the AddRec due to wrapping (for example consider MaxBTC = -2).
If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get incremented by EltSize before returning, so this effectively sets ScEnd to unsigned max. Note that LAA separately checks that accesses cannot not wrap (52ded672492, https://github.com/llvm/llvm-project/pull/127543), so unsigned max represents an upper bound.
When there is a computable backedge-taken count, we are guaranteed to execute the number of iterations, and if any pointer would wrap it would be UB (or the access will never be executed, so cannot alias). It includes new tests from the previous discussion that show a case we wrap with a BTC, but it is UB due to the pointer after the object wrapping (in `evaluate-at-backedge-taken-count-wrapping.ll`)
Note that an earlier version of the patch was shared as https://github.com/llvm/llvm-project/pull/106530, but I accidentally deleted the branch and now I cannot figure out how to reopen that PR.
>From b985f31dfd2727a784efe58dec7e169edac98ea2 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Thu, 29 Aug 2024 12:21:20 +0100
Subject: [PATCH] [LAA] Be more careful when evaluating AddRecs at symbolic max
BTC.
Evaluating AR at the symbolic max BTC may wrap and create an expression
that is less than the start of the AddRec due to wrapping (for example
consider MaxBTC = -2).
If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get
incremented by EltSize before returning, so this effectively
sets ScEnd to unsigned max. Note that LAA separately checks that
accesses cannot not wrap, so unsigned max represents an upper bound.
---
.../llvm/Analysis/LoopAccessAnalysis.h | 2 +-
llvm/lib/Analysis/Loads.cpp | 5 +-
llvm/lib/Analysis/LoopAccessAnalysis.cpp | 42 +++++++--
...aluate-at-backedge-taken-count-wrapping.ll | 92 +++++++++++++++++++
...bolic-max-backedge-taken-count-may-wrap.ll | 6 +-
5 files changed, 131 insertions(+), 16 deletions(-)
create mode 100644 llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index cb6f47e3a76be..91802cc4361ae 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -872,7 +872,7 @@ bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
/// NoConflict = (P2.Start >= P1.End) || (P1.Start >= P2.End)
std::pair<const SCEV *, const SCEV *> getStartAndEndForAccess(
const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, const SCEV *MaxBECount,
- ScalarEvolution *SE,
+ const SCEV *SymbolicMaxBECount, ScalarEvolution *SE,
DenseMap<std::pair<const SCEV *, Type *>,
std::pair<const SCEV *, const SCEV *>> *PointerBounds);
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index b461c41d29e84..5a8eedfa261d2 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -319,11 +319,14 @@ bool llvm::isDereferenceableAndAlignedInLoop(
const SCEV *MaxBECount =
Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
: SE.getConstantMaxBackedgeTakenCount(L);
+ const SCEV *SymbolicMaxBECount =
+ Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
+ : SE.getConstantMaxBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(MaxBECount))
return false;
const auto &[AccessStart, AccessEnd] = getStartAndEndForAccess(
- L, PtrScev, LI->getType(), MaxBECount, &SE, nullptr);
+ L, PtrScev, LI->getType(), MaxBECount, SymbolicMaxBECount, &SE, nullptr);
if (isa<SCEVCouldNotCompute>(AccessStart) ||
isa<SCEVCouldNotCompute>(AccessEnd))
return false;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index a1d91de3bb788..cdce1f1941c2f 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -190,7 +190,7 @@ RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(
std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, const SCEV *MaxBECount,
- ScalarEvolution *SE,
+ const SCEV *SymbolicMaxBECount, ScalarEvolution *SE,
DenseMap<std::pair<const SCEV *, Type *>,
std::pair<const SCEV *, const SCEV *>> *PointerBounds) {
std::pair<const SCEV *, const SCEV *> *PtrBoundsPair;
@@ -206,11 +206,31 @@ std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
const SCEV *ScStart;
const SCEV *ScEnd;
+ auto &DL = Lp->getHeader()->getDataLayout();
+ Type *IdxTy = DL.getIndexType(PtrExpr->getType());
+ const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
if (SE->isLoopInvariant(PtrExpr, Lp)) {
ScStart = ScEnd = PtrExpr;
} else if (auto *AR = dyn_cast<SCEVAddRecExpr>(PtrExpr)) {
ScStart = AR->getStart();
- ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+ if (!isa<SCEVCouldNotCompute>(MaxBECount))
+ // Evaluating AR at an exact BTC is safe: LAA separately checks that
+ // accesses cannot wrap in the loop. If evaluating AR at BTC wraps, then
+ // the loop either triggers UB when executing a memory access with a
+ // poison pointer or the wrapping/poisoned pointer is not used.
+ ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+ else {
+ // Evaluating AR at MaxBTC may wrap and create an expression that is less
+ // than the start of the AddRec due to wrapping (for example consider
+ // MaxBTC = -2). If that's the case, set ScEnd to -(EltSize + 1). ScEnd
+ // will get incremented by EltSize before returning, so this effectively
+ // sets ScEnd to unsigned max. Note that LAA separately checks that
+ // accesses cannot not wrap, so unsigned max represents an upper bound.
+ ScEnd = AR->evaluateAtIteration(SymbolicMaxBECount, *SE);
+ if (!SE->isKnownNonNegative(SE->getMinusSCEV(ScEnd, ScStart)))
+ ScEnd = SE->getNegativeSCEV(
+ SE->getAddExpr(EltSizeSCEV, SE->getOne(EltSizeSCEV->getType())));
+ }
const SCEV *Step = AR->getStepRecurrence(*SE);
// For expressions with negative step, the upper bound is ScStart and the
@@ -232,9 +252,6 @@ std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
assert(SE->isLoopInvariant(ScEnd, Lp) && "ScEnd needs to be invariant");
// Add the size of the pointed element to ScEnd.
- auto &DL = Lp->getHeader()->getDataLayout();
- Type *IdxTy = DL.getIndexType(PtrExpr->getType());
- const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);
std::pair<const SCEV *, const SCEV *> Res = {ScStart, ScEnd};
@@ -250,9 +267,11 @@ void RuntimePointerChecking::insert(Loop *Lp, Value *Ptr, const SCEV *PtrExpr,
unsigned DepSetId, unsigned ASId,
PredicatedScalarEvolution &PSE,
bool NeedsFreeze) {
- const SCEV *MaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+ const SCEV *SymbolicMaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+ const SCEV *MaxBECount = PSE.getBackedgeTakenCount();
const auto &[ScStart, ScEnd] = getStartAndEndForAccess(
- Lp, PtrExpr, AccessTy, MaxBECount, PSE.getSE(), &DC.getPointerBounds());
+ Lp, PtrExpr, AccessTy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+ &DC.getPointerBounds());
assert(!isa<SCEVCouldNotCompute>(ScStart) &&
!isa<SCEVCouldNotCompute>(ScEnd) &&
"must be able to compute both start and end expressions");
@@ -1933,11 +1952,14 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
// required for correctness.
if (SE.isLoopInvariant(Src, InnermostLoop) ||
SE.isLoopInvariant(Sink, InnermostLoop)) {
- const SCEV *MaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+ const SCEV *MaxBECount = PSE.getBackedgeTakenCount();
+ const SCEV *SymbolicMaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
const auto &[SrcStart_, SrcEnd_] = getStartAndEndForAccess(
- InnermostLoop, Src, ATy, MaxBECount, PSE.getSE(), &PointerBounds);
+ InnermostLoop, Src, ATy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+ &PointerBounds);
const auto &[SinkStart_, SinkEnd_] = getStartAndEndForAccess(
- InnermostLoop, Sink, BTy, MaxBECount, PSE.getSE(), &PointerBounds);
+ InnermostLoop, Sink, BTy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+ &PointerBounds);
if (!isa<SCEVCouldNotCompute>(SrcStart_) &&
!isa<SCEVCouldNotCompute>(SrcEnd_) &&
!isa<SCEVCouldNotCompute>(SinkStart_) &&
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll
new file mode 100644
index 0000000000000..d58dd38d9fef8
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll
@@ -0,0 +1,92 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='print<access-info>' -disable-output %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+
+; Note: The datalayout for the test specifies a 32 bit index type.
+
+; No UB: accessing last valid byte, pointer after the object
+; doesnt wrap (%p + 2147483647).
+define void @pointer_after_object_does_not_wrap(i32 %y, ptr %s, ptr %p) {
+; CHECK-LABEL: 'pointer_after_object_does_not_wrap'
+; CHECK-NEXT: loop:
+; CHECK-NEXT: Memory dependences are safe with run-time checks
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP1:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+; CHECK-NEXT: Against group ([[GRP2:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep1.iv = getelementptr inbounds i8, ptr %s, i32 %iv
+; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP1]]:
+; CHECK-NEXT: (Low: (%y + %p) High: (2147483647 + %p))
+; CHECK-NEXT: Member: {(%y + %p),+,1}<nw><%loop>
+; CHECK-NEXT: Group [[GRP2]]:
+; CHECK-NEXT: (Low: (%y + %s) High: (2147483647 + %s))
+; CHECK-NEXT: Member: {(%y + %s),+,1}<nw><%loop>
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop ]
+ %gep1.iv = getelementptr inbounds i8 , ptr %s, i32 %iv
+ %load = load i8, ptr %gep1.iv, align 4
+ %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+ store i8 %load, ptr %gep2.iv, align 4
+ %iv.next = add nsw i32 %iv, 1
+ %c.2 = icmp slt i32 %iv.next, 2147483647
+ br i1 %c.2, label %loop, label %exit
+
+exit:
+ ret void
+}
+
+; UB: accessing %p + 2147483646 and p + 2147483647.
+; Pointer the past the object would wrap in signed.
+define void @pointer_after_object_would_wrap(i32 %y, ptr %s, ptr %p) {
+; CHECK-LABEL: 'pointer_after_object_would_wrap'
+; CHECK-NEXT: loop:
+; CHECK-NEXT: Memory dependences are safe with run-time checks
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP3:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+; CHECK-NEXT: Against group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep1.iv = getelementptr inbounds i8, ptr %s, i32 %iv
+; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP3]]:
+; CHECK-NEXT: (Low: (%y + %p) High: (-2147483648 + %p))
+; CHECK-NEXT: Member: {(%y + %p),+,1}<nw><%loop>
+; CHECK-NEXT: Group [[GRP4]]:
+; CHECK-NEXT: (Low: (%y + %s) High: (-2147483648 + %s))
+; CHECK-NEXT: Member: {(%y + %s),+,1}<nw><%loop>
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop ]
+ %gep1.iv = getelementptr inbounds i8 , ptr %s, i32 %iv
+ %load = load i16, ptr %gep1.iv, align 4
+ %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+ store i16 %load, ptr %gep2.iv, align 4
+ %iv.next = add nsw i32 %iv, 1
+ %c.2 = icmp slt i32 %iv.next, 2147483647
+ br i1 %c.2, label %loop, label %exit
+
+exit:
+ ret void
+}
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
index dd06cab26d095..0aa74c7b6442b 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
@@ -3,7 +3,6 @@
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
-; FIXME: Start == End for access group with AddRec.
define void @runtime_checks_with_symbolic_max_btc_neg_1(ptr %P, ptr %S, i32 %x, i32 %y) {
; CHECK-LABEL: 'runtime_checks_with_symbolic_max_btc_neg_1'
; CHECK-NEXT: loop:
@@ -17,7 +16,7 @@ define void @runtime_checks_with_symbolic_max_btc_neg_1(ptr %P, ptr %S, i32 %x,
; CHECK-NEXT: ptr %S
; CHECK-NEXT: Grouped accesses:
; CHECK-NEXT: Group [[GRP1]]:
-; CHECK-NEXT: (Low: ((4 * %y) + %P) High: ((4 * %y) + %P))
+; CHECK-NEXT: (Low: ((4 * %y) + %P) High: -1)
; CHECK-NEXT: Member: {((4 * %y) + %P),+,4}<%loop>
; CHECK-NEXT: Group [[GRP2]]:
; CHECK-NEXT: (Low: %S High: (4 + %S))
@@ -44,7 +43,6 @@ exit:
ret void
}
-; FIXME: Start > End for access group with AddRec.
define void @runtime_check_with_symbolic_max_btc_neg_2(ptr %P, ptr %S, i32 %x, i32 %y) {
; CHECK-LABEL: 'runtime_check_with_symbolic_max_btc_neg_2'
; CHECK-NEXT: loop:
@@ -58,7 +56,7 @@ define void @runtime_check_with_symbolic_max_btc_neg_2(ptr %P, ptr %S, i32 %x, i
; CHECK-NEXT: ptr %S
; CHECK-NEXT: Grouped accesses:
; CHECK-NEXT: Group [[GRP3]]:
-; CHECK-NEXT: (Low: ((4 * %y) + %P) High: (-4 + (4 * %y) + %P))
+; CHECK-NEXT: (Low: ((4 * %y) + %P) High: -1)
; CHECK-NEXT: Member: {((4 * %y) + %P),+,4}<%loop>
; CHECK-NEXT: Group [[GRP4]]:
; CHECK-NEXT: (Low: %S High: (4 + %S))
More information about the llvm-commits
mailing list