[llvm] [DeadStoreElimination] Optimize tautological assignments (PR #75744)
Shreyansh Chouhan via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 21 07:37:24 PST 2023
================
@@ -1926,6 +1926,43 @@ struct DSEState {
if (InitC && InitC == StoredConstant)
return MSSA.isLiveOnEntryDef(
MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def, BatchAA));
+
+ if (!Store)
+ return false;
+
+ // If there is a dominating condition, that ensures that the value
+ // being stored in a memory location is already present at the
+ // memory location, the store is a noop.
+ BasicBlock *StoreBB = DefI->getParent();
+ auto *StorePtr = Store->getOperand(1);
+
+ DomTreeNode *IDom = DT.getNode(StoreBB)->getIDom();
+ if (!IDom)
+ return false;
+
+ auto *TI = IDom->getBlock()->getTerminator();
+ ICmpInst::Predicate Pred;
+ BasicBlock *TrueBB, *FalseBB;
+
+ if (!match(TI, m_Br(m_ICmp(Pred, m_Load(m_Specific(StorePtr)),
+ m_Specific(StoredConstant)),
+ TrueBB, FalseBB)))
+ return false;
+
+ MemoryAccess *LastMod =
+ MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def, BatchAA);
+
+ DomTreeNode *CDom = DT.getNode(LastMod->getBlock());
----------------
BK1603 wrote:
What is even more interesting is, even if we return the value stored at ptr x, we still optimize it in a similar manner (which I don't feel is the current behavior)
Say if we take:
```
@x = dso_local local_unnamed_addr global i32 0, align 4
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, argmem: none, inaccessiblem
em: none) uwtable
define dso_local i32 @foo() local_unnamed_addr #0 {
%1 = load i32, ptr @x, align 4, !tbaa !3
store i32 7, ptr @x, align 4, !tbaa !3
%2 = icmp eq i32 %1, 4
br i1 %2, label %3, label %4
3: ; preds = %0
store i32 4, ptr @x, align 4, !tbaa !3
br label %4
4: ; preds = %3, %0
%5 = load i32, ptr @x, align 4, !tbaa !3
ret i32 %5
}
```
The code is optimized to:
```
@x = dso_local local_unnamed_addr global i32 0, align 4
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, argmem: none, inaccessiblemem: none) uwtable
define dso_local i32 @foo() local_unnamed_addr #0 {
%1 = load i32, ptr @x, align 4, !tbaa !3
%2 = icmp eq i32 %1, 4
%spec.store.select = select i1 %2, i32 4, i32 7
store i32 %spec.store.select, ptr @x, align 4
ret i32 %spec.store.select
}
```
Both these functions do not behave the same, in the unoptimized version, the function would always return 7, (which is what x would hold by the end of the function,) in the optimized version however the return value would depend upon the value of x at the time of function entry.
https://github.com/llvm/llvm-project/pull/75744
More information about the llvm-commits
mailing list