[PATCH] D153453: [MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas

Mon Jul 10 06:03:46 PDT 2023

khei4 added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1449
+  // PostDominator and we would be able to relax the condition for each blocks.
+  if (SrcAlloca->getParent() != DestAlloca->getParent()) {
+    LLVM_DEBUG(dbgs() << "Stack Move: src and dest allocas are not in the "
----------------
nikic wrote:
> This is implied by isStaticAlloca(), which requires allocas to be in the entry block. The interesting block limitation is the check during the use loop.
> This is implied by isStaticAlloca(), which requires allocas to be in the entry block.

Good catch! I got the point by seeing implementation fo isStaticAlloca()! 

> The interesting block limitation is the check during the use loop.

Sorry, it might be the point I misunderstood. You mean, alloca and memcpy in the loop body are practically common, and to handle only static alloca is too restrictive, right? (Simple loop with loop-local clone types seems to alloc at entry blocks i.e. https://rust.godbolt.org/z/bjx68djvY)

================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1501
+      if (GEP->isInBounds())
+        return true;
+    bool CanBeNull, CanBeFreed;
----------------
nikic wrote:
> Drop the inbounds handling code, it's no longer correct. Actually, it would be fine to just pass nullptr instead of IsDereferenceableOrNull, we don't really care about this case.
Oh, thanks! I took it from https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp#L1483-L1497. I believe the same goes for it.

================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1505
+  };
+  SmallVector<Instruction *, 8> Worklist;
+  Worklist.push_back(DestAlloca);
----------------
nikic wrote:
> There should also be a Visited set, otherwise we can go into an infinite loop if phi nodes are involved.
> 
> We can then also limit the Visited set size using getDefaultMaxUsesToExploreForCaptureTracking().
> We can then also limit the Visited set size using getDefaultMaxUsesToExploreForCaptureTracking().

Oh, it's good to have it already! Thanks, I'll use it!

================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1589
+    return true;
+  };
+
----------------
nikic wrote:
> When checking dest, you allow ref before the store. Here you allow any mod/ref before the store. That means you could have ref of dest and mod of src. After merging, the ref of dst might read the mod of src now. If I'm following the logic right, this wouldn't be correct.
> 
> TBH I don't think there's a need to allow ref of dst before the store. As there can't be any mod, this means the ref will always read uninitialized memory. This doesn't seem like a useful case to handle, as those refs can be optimized away.
> When checking dest, you allow ref before the store. Here you allow any mod/ref before the store. That means you could have ref of dest and mod of src. After merging, the ref of dst might read the mod of src now. If I'm following the logic right, this wouldn't be correct.

You seem right. 

> TBH I don't think there's a need to allow ref of dst before the store. As there can't be any mod, this means the ref will always read uninitialized memory. This doesn't seem like a useful case to handle, as those refs can be optimized away.
I agreed, let this bailout if any ModRef on dest before the copy.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153453/new/

https://reviews.llvm.org/D153453