[PATCH] D72631: [DSE] Eliminate stores at the end of the function.

Alina Sbirlea via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 23 12:54:10 PDT 2020


asbirlea accepted this revision.
asbirlea added a comment.
This revision is now accepted and ready to land.

Thank you for the patch!

Minor comments on the tests for keeping the patch clean, but otherwise lgtm.

Please consider the performance comments for future improvements.



================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1658
+      Instruction *UseInst = cast<MemoryUseOrDef>(UseAccess)->getMemoryInst();
+      if (isReadClobber(*MaybeLoc, UseInst)) {
+        LLVM_DEBUG(dbgs() << "  ... hit read clobber " << *UseInst << ".\n");
----------------
FWIW, this call will add up a lot on compile-time. Those AA calls are very expensive, and the MemorySSAScanLimit is per call, so dependent on the number of MemoryDefs in `MemDefs` when called in the `eliminateDeadWritesAtEndOfFunction`. 
In the worst case, you'd hit something like:

```
;a, b, c, d noalias
store to a
store to b
store to c
store to d
load/use d
; repeat read/write to d until MemorySSAScanLimit is hit
```

The store to a,b,c cannot be eliminated because they each hit the limit, but the work is done towards checking all accesses to d.
A potential helper would be a caching mechanism similar to what MemorySSA does internally. Along the lines of: if the first `store to d` is found as NoAlias and subsequent access is a load to the same MemoryLocation, this is known NoAlias. 
Also worth considering for reducing compile-time, if it's not a pattern as straight-forward as store/load/mem_cpy to a precise locations to just bail-out and not do the AA call.

This comment is solely focused on performance when trying to turn this on.


================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1987
+        SmallVector<const Value *, 4> Pointers;
+        GetUnderlyingObjects(getLocForWriteEx(DefI)->Ptr, Pointers, DL);
+
----------------
Also for performance, it would be great to have some metrics into what's more expensive to perform first: `isWriteAtEndOfFunction` or `GetUnderlyingObjects`+searching the map `InvisibleToCallerAfterRet`.


================
Comment at: llvm/test/Transforms/DeadStoreElimination/MSSA/combined-partial-overwrites.ll:255
 ; CHECK-NEXT:    [[BPTR2:%.*]] = getelementptr inbounds i8, i8* [[BPTR]], i64 2
+; CHECK-NEXT:    [[BPTR3:%.*]] = getelementptr inbounds i8, i8* [[BPTR]], i64 3
 ; CHECK-NEXT:    [[WPTR:%.*]] = bitcast i8* [[BPTR]] to i16*
----------------
Look unrelated to the patch? Perhaps commit separately as NFC?


================
Comment at: llvm/test/Transforms/DeadStoreElimination/MSSA/memintrinsics.ll:38
 ; CHECK-LABEL: @test3(
+; CHECK-NEXT:    [[B:%.*]] = alloca i8, align 1
 ; CHECK-NEXT:    ret void
----------------
Commit first as NFC?
Same for below?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72631/new/

https://reviews.llvm.org/D72631





More information about the llvm-commits mailing list