[llvm] d6394d8 - [cgp] improve robustness of uadd/usub transforms
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 9 11:52:16 PST 2021
Author: Philip Reames
Date: 2021-03-09T11:52:08-08:00
New Revision: d6394d86cadf20f09cba9c013bfccc277f565212
URL: https://github.com/llvm/llvm-project/commit/d6394d86cadf20f09cba9c013bfccc277f565212
DIFF: https://github.com/llvm/llvm-project/commit/d6394d86cadf20f09cba9c013bfccc277f565212.diff
LOG: [cgp] improve robustness of uadd/usub transforms
LSR prefers to schedule iv increments just before the latch. The recent 80511565 broadened this to moving increments in the original IR. This pointed out a robustness problem with the CGP transform.
When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform. Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses. We can handle that case cheaply, resulting in a more robust transform.
Added:
Modified:
llvm/lib/CodeGen/CodeGenPrepare.cpp
llvm/test/CodeGen/X86/usub_inc_iv.ll
Removed:
################################################################################
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index b6fbfa842133..9de4261f25c6 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -1352,16 +1352,17 @@ bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,
if (LI->getLoopFor(Cmp->getParent()) != L)
return false;
- // IV increment may have other users than the IV. We do not want to make
- // dominance queries to analyze the legality of moving it towards the cmp,
- // so just check that there is no other users.
- if (!BO->hasOneUse())
- return false;
- // Ultimately, the insertion point must dominate latch. This should be a
- // cheap check because no CFG changes & dom tree recomputation happens
- // during the transform.
- Function *F = BO->getParent()->getParent();
- return getDT(*F).dominates(Cmp->getParent(), L->getLoopLatch());
+ // Finally, we need to ensure that the insert point will dominate all
+ // existing uses of the increment.
+
+ auto &DT = getDT(*BO->getParent()->getParent());
+ if (DT.dominates(Cmp->getParent(), BO->getParent()))
+ // If we're moving up the dom tree, all uses are trivially dominated.
+ // (This is the common case for code produced by LSR.)
+ return true;
+
+ // Otherwise, special case the single use in the phi recurrence.
+ return BO->hasOneUse() && DT.dominates(Cmp->getParent(), L->getLoopLatch());
};
if (BO->getParent() != Cmp->getParent() && !IsReplacableIVIncrement(BO)) {
// We used to use a dominator tree here to allow multi-block optimization.
diff --git a/llvm/test/CodeGen/X86/usub_inc_iv.ll b/llvm/test/CodeGen/X86/usub_inc_iv.ll
index da0fde128890..bc3d55d3b831 100644
--- a/llvm/test/CodeGen/X86/usub_inc_iv.ll
+++ b/llvm/test/CodeGen/X86/usub_inc_iv.ll
@@ -314,23 +314,23 @@ define i32 @test_06(i32* %p, i64 %len, i32 %x) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ [[LEN:%.*]], [[ENTRY:%.*]] ]
-; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0
-; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[MATH:%.*]], [[BACKEDGE:%.*]] ], [ [[LEN:%.*]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)
+; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0
+; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
+; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]
; CHECK: backedge:
-; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4
-; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[P:%.*]] to i8*
-; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, i8* [[TMP0]], i64 [[SUNKADDR]]
-; CHECK-NEXT: [[SUNKADDR2:%.*]] = getelementptr i8, i8* [[SUNKADDR1]], i64 -4
-; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8* [[SUNKADDR2]] to i32*
-; CHECK-NEXT: [[LOADED:%.*]] = load atomic i32, i32* [[TMP1]] unordered, align 4
-; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1
+; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[MATH]], 4
+; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[P:%.*]] to i8*
+; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, i8* [[TMP1]], i64 [[SUNKADDR]]
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8* [[SUNKADDR1]] to i32*
+; CHECK-NEXT: [[LOADED:%.*]] = load atomic i32, i32* [[TMP2]] unordered, align 4
; CHECK-NEXT: [[COND_2:%.*]] = icmp eq i32 [[LOADED]], [[X:%.*]]
; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
; CHECK: exit:
; CHECK-NEXT: ret i32 -1
; CHECK: failure:
-; CHECK-NEXT: [[TRUNC:%.*]] = trunc i64 [[IV_NEXT]] to i32
+; CHECK-NEXT: [[TRUNC:%.*]] = trunc i64 [[MATH]] to i32
; CHECK-NEXT: ret i32 [[TRUNC]]
;
entry:
More information about the llvm-commits
mailing list