[PATCH] D102834: [SLPVectorizer] Implement initial memory versioning.

Mon Dec 6 07:27:03 PST 2021

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:189
+static cl::opt<bool> EnableMemoryVersioning(
+    "slp-memory-versioning", cl::init(false), cl::Hidden,
+    cl::desc("Enable memory versioning for SLP vectorization."));
----------------
SjoerdMeijer wrote:
> This is a nitpick, bikeshedding names... but I was wondering about the terminology here (i.e. memory versioning).
> We emit runtime memory checks, and do versioning of a block. So was wondering if this should be BlockVersioning. 
My intention with the naming here was to be clear about why/what we version; the versioning is based on the memory accesses, we will create one version with the original memory access and a second version where memory accesses to different underlying objects won't alias.

I'm not sure about tying this directly to the scope we support at the moment (single block). We might want to extend the scope in the future. For example, consider a loop which contains a block to version, where each access also depends on an induction variable. We might want to try to push the checks outside the loop, by widening them to encompass the whole iteration space, so we only need to check once.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:692

+struct AccessInfo {
+  Value *UnderlyingObj;
----------------
ABataev wrote:
> Comments? Also, put it to a namespace
put into an anonymous namespace and added comments

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:693-697
+  Value *UnderlyingObj;
+  const SCEV *PtrSCEV;
+  Type *AccessTy;
+
+  AccessInfo() : UnderlyingObj(nullptr), PtrSCEV(nullptr), AccessTy(nullptr) {}
----------------
ABataev wrote:
> Default initializers and defaulted constructor?
remove the `AccessInfo()` constructor, added default values instead

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:706
+      if (auto *L = dyn_cast<LoadInst>(I)) {
+        if (!L->getType()->isVectorTy())
+          return {L->getPointerOperand(), L->getType()};
----------------
ABataev wrote:
> `isValidElementType`?
updated, thanks

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6999
+              if (!DestBundle->IsScheduled && SLP->CollectMemAccess) {
+                // FIXME Naming
+                auto GetPtr = [](Instruction *I) -> Value * {
----------------
SjoerdMeijer wrote:
> Remove this?
it's gone now

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7611
+    Tail = SplitBlock(BB, LastTrackedInst->getNextNode(), &DTU, LI, nullptr,
+                      OriginalBBName + ".tail");
+  auto *CheckBlock = BB;
----------------
SjoerdMeijer wrote:
> Don't think I have seen a block name with the .tail suffix in the tests.
This split is mainly there to make it easier to split off only the region containing from first tracked to last tracked. It is later folded back into the merge block.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7650
+  MDBuilder MDB(Ctx);
+  MDNode *Domain = MDB.createAnonymousAliasScopeDomain("SLPVerDomain");
+
----------------
SjoerdMeijer wrote:
> I am not that familiar with this, but do we expect this in one of the tests?
The domain? It is in the metadata nodes created, but the tests only check the `!noalias` attachments, not the definitions of the metadata at the moment.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7657-7663
+    auto GetPtr = [](Instruction *I) -> Value * {
+      if (auto *L = dyn_cast<LoadInst>(I))
+        return L->getPointerOperand();
+      if (auto *S = dyn_cast<StoreInst>(I))
+        return S->getPointerOperand();
+      return nullptr;
+    };
----------------
ABataev wrote:
> Move out of the loop. Also, similar lambda is used in another place, worth creating a static function instead of lambdas.
updated to use `getLoadStorePointerOperand` from `Instructions.h`.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7734
+  // checks.
+  SLPCost += 5 * PointerChecks.size() + MemBounds.size();
+  if (SLPCost >= ScalarCost) {
----------------
ABataev wrote:
> Do you consider the cost of all new check instructions as `1` here?
Yes, it assumes each instruction costs `1`.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7735
+  SLPCost += 5 * PointerChecks.size() + MemBounds.size();
+  if (SLPCost >= ScalarCost) {
+    // Vectorization not beneficial or possible. Restore original state by
----------------
SjoerdMeijer wrote:
> Perhaps create a local helper function to handle this case.
I moved it to a function.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7736
+  if (SLPCost >= ScalarCost) {
+    // Vectorization not beneficial or possible. Restore original state by
+    // removing the introduced blocks.
----------------
SjoerdMeijer wrote:
> Re: "or possible", I would guess it's possible but not profitable?
I guess saying not beneficial is enough. When we reach here, it is always possible. Originally the `possible` bit was intended to refer to the case where we version but nothing gets vectorized because SLP fails due to other reasons. That's not really clear though, I'll remove it.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7763
+    NumVersioningFailed++;
+  } else {
+    R.getORE()->emit(
----------------
SjoerdMeijer wrote:
> And another helper for this case too.
I think moving this to a separate function would require a lot of arguments and the code here is quite compact.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6843

-bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
-                                TargetTransformInfo *TTI_,
-                                TargetLibraryInfo *TLI_, AAResults *AA_,
-                                LoopInfo *LI_, DominatorTree *DT_,
-                                AssumptionCache *AC_, DemandedBits *DB_,
-                                OptimizationRemarkEmitter *ORE_) {
+SLPVectorizerResult SLPVectorizerPass::vectorizeBlockWithVersioning(
+    BasicBlock *BB, const SmallPtrSetImpl<Value *> &TrackedObjects,
----------------
SjoerdMeijer wrote:
> Somewhere a high-level idea and description of the algorithm would be nice, I guess that would be here.
I added an outline at the top of the function.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6870
+
+    unsigned AS = Obj->getType()->getPointerAddressSpace();
+
----------------
SjoerdMeijer wrote:
> Unused?
I think this is out of date, AS and WrittenObjs should be used.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6872
+
+    // We know that the Start is dereferenced, hence adding one should not
+    // overflow: https://alive2.llvm.org/ce/z/dTuGLx
----------------
SjoerdMeijer wrote:
> That wasn't obvious to me, why is Start dereferenced?
We get the `Start` expression from the pointer operand of either a store or load, so it should always be dereference by those instructions.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6873
+    // We know that the Start is dereferenced, hence adding one should not
+    // overflow: https://alive2.llvm.org/ce/z/dTuGLx
+    auto &DL = BB->getModule()->getDataLayout();
----------------
SjoerdMeijer wrote:
> Do we want to link to alive (not sure how persistent the data is and if it's common), or just expand the example here?
I'll remove it.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6900
+    return all_of(make_range(FirstTrackedInst->getIterator(),
+                             std::next(LastTrackedInst->getIterator())),
+                  [LastTrackedInst, BB](Instruction &I) {
----------------
SjoerdMeijer wrote:
> Could `std::next` return null (or end)?
I don't think so; both `LastTracked` and `FirstTracked` should point to loads or stores, so there should always be a next instruction (the terminator). If they would be 0, there shouldn't be any bounds (MemBounds should be empty).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102834/new/

https://reviews.llvm.org/D102834