[PATCH] D140089: [MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together.

Nikita Popov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 6 08:17:15 PST 2023


nikic added a comment.

Some mostly stylistic notes, I didn't get to the core logic yet.

High level question: Do we need to do anything about debug info during this transform? I suspect "yes", but I have absolutely no clue about debug info.



================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:77
+                                         "stack-move optimization may examine"),
+                                cl::init(250));
 
----------------
This limit is pretty large. Do you have any data that shows that there is any substantial increase in optimization opportunities over a more conservative value, such as 8 or 16?


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:819
+      // memcpys from an alloca to an alloca.
+      if (AllocaInst *DestAlloca =
+              dyn_cast<AllocaInst>(SI->getPointerOperand())) {
----------------
`auto *` with dyn_cast


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:824
+          if (performStackMoveOptzn(LI, SI, DestAlloca, SrcAlloca,
+                                    DL.getTypeStoreSize(T))) {
+            // Avoid invalidating the iterator.
----------------
This casts the TypeSize to int, which will assert for scalable vectors. Please add a test with scalable load/store.

(I think if you pass in TypeSize rather than uint64_t and use that, you can probably support scalable vectors without much effort, though just bailing out is also fine.)


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:826
+            // Avoid invalidating the iterator.
+            BBI = SI->getNextNonDebugInstruction()->getIterator();
+            eraseInstruction(SI);
----------------
The iterator is moved forward before the process functions are called, so it already points to the instruction after the store. I think what you want to do here is just `++BBI;`, which will get undone by the step back that returning true from here performs. (Ugh, the iterator handling in memcpyopt is a mess...)


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1861
+  const DataLayout &DL = DestAlloca->getModule()->getDataLayout();
+  std::optional<TypeSize> SrcSize = SrcAlloca->getAllocationSizeInBits(DL);
+  if (!SrcSize || SrcSize->isScalable() ||
----------------
I've added a getAllocationSize() method in https://github.com/llvm/llvm-project/commit/a6a526ec5465d712db11fdbf5ed5fce8da0722cf to avoid the *8 back and forth here.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1877
+  // uses, as well as problems related to llvm.stacksave and llvm.stackrestore
+  // intrinsics.
+  if (!allocaIsAtStartOfEntryBlock(DestAlloca) ||
----------------
For stacksave/stackrestore it would be fine to check just isStaticAlloca() (in fact, you might want to do that anyway for inalloca allocas, though they might get excluded by other checks already). Even if the alloca is not at the start of the entry block, it will won't get affected by stacksave/stackrestore.

For domination, as you skip past GEP instructions (which might be based on the allocas), I'm not sure this would actually avoid all possible domination issues. What I'd do is check isStaticAlloca here, and then in the transform move the retained alloca up if necessary.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1914
+  if (!computeLiveness(DestTracker.BBLiveness) ||
+      !computeLiveness(SrcTracker.BBLiveness)) {
+    return false;
----------------
nit: Omit braces.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1923
+  // worthwhile.
+  for (auto DestPair : DestTracker.BBLiveness) {
+    BasicBlock *BB = DestPair.first;
----------------
or so


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1938
+  bool SrcLive = SrcTracker.BBLiveness.lookup(StoreBB).isLiveOut();
+  for (auto &BI : reverse(*StoreBB)) {
+    if (DestLive && SrcLive && &BI != Load && &BI != Store) {
----------------
This inspects a potentially large number of instructions.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1976
+  Type *IntPtrTy =
+      Type::getIntNTy(SrcAlloca->getContext(), DL.getPointerSizeInBits());
+  ConstantInt *CI = cast<ConstantInt>(ConstantInt::get(IntPtrTy, Size));
----------------
You want the index size of the alloca address space here, not the pointer size of the default address space.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1982
+  if (Dom == SrcAlloca->getParent() && InsertionPt != Dom->end() &&
+      InsertionPt->comesBefore(SrcAlloca)) {
+    // Make sure that the alloca dominates the lifetime start intrinsic.
----------------
This looks a bit odd, it might be better to use the instruction-level common dominator instead? I've landed https://github.com/llvm/llvm-project/commit/7f0de9573f758f5f9108795850337a5acbd17eef to add a convenience API for this.


================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:2150
+  // performStackMoveOptzn() for more details.
+  AllocaInst *DestAlloca = dyn_cast<AllocaInst>(M->getDest());
+  if (DestAlloca == nullptr)
----------------
nit: `auto *` for dyn_cast


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140089/new/

https://reviews.llvm.org/D140089



More information about the llvm-commits mailing list