[PATCH] D138751: [MemCpyOpt] Expand two memcpy's with clobber inbetween (PR59116)

Wed Dec 7 07:18:23 PST 2022

nikic added a comment.

At a high level, I'd say that this transform would be a better fit for SROA. The profitability is clearer if we can actually eliminate the alloca and spill from the first memcpy, making this a single load and store.

================
Comment at: llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp:1195
+  const unsigned NeededRegs = divideCeil(8 * NumBytes, RegBitWidth);
+  if (NeededRegs > NumRegs)
+    return false;
----------------
So we want to use up *all* vector registers for the copy? That's like 64 * 32 = 2048 bytes for AVX-512. That seems *way* too aggressive.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138751/new/

https://reviews.llvm.org/D138751