[PATCH] D132892: [LV] Explicily lower uniform load as single load

Wed Aug 31 03:16:39 PDT 2022

david-arm added a comment.

I think this looks like a nice, logical improvement. Having a new uniform widening decision seems sensible. I have a few mostly minor comments, but I was a little surprised by a couple of test changes.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4262
         return WideningDecision == CM_Scalarize;
+
     assert(Ptr == getLoadStorePointerOperand(MemAccess) &&
----------------
nit: White space change.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4712

+      // Only try to scalarize the uniform memop itself if legal and we're not
+      // using the direct lowering strategy (which is strictly better).
----------------
I'm not sure what the "if legal" bit refers to here? If we're treating this as uniform then it's because Legal->isUniformMemOp has returned true. I think you could just say:

  // Only try to scalarize the uniform memop itself if we're not using the direct lowering ...

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9746

+    // FLAGIT
+    assert(!isUniformMemOp() &&
----------------
Not sure what this comment means?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9783
+  if (isUniformMemOp()) {
+    Value *Addr = State.get(getAddr(), {0, 0});
+    auto *NewLI = Builder.CreateAlignedLoad(ScalarDataTy, Addr,
----------------
I think we normally write this as 

  Value *Addr = State.get(getAddr(), VPIteration(0, 0));

which is what we've done elsewhere.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9785
+    auto *NewLI = Builder.CreateAlignedLoad(ScalarDataTy, Addr,
+                                              Alignment);
+    // Add metadata to the load, but setVectorValue to the reverse shuffle.
----------------
Looks like this can fit on the end of the line above?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9786
+                                              Alignment);
+    // Add metadata to the load, but setVectorValue to the reverse shuffle.
+    State.addMetadata(NewLI, LI);
----------------
Not sure I understand this comment? Why does this involve a reverse shuffle?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1774

+  bool isUniformMemOp() const { return UniformMemOp; }
+
----------------
nit: I think this should have a simple comment like the functions above.

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll:26
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
----------------
This looks a bit unexpected. Do you know why the minimum iteration check has changed here from 8 to 4? It sounds like we've made a different choice of VF and/or interleave count (IC). I can only assume that the cost of some instructions has changed?

================
Comment at: llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll:476
 ; CHECK-NEXT:   loop.0:
-; CHECK-NEXT:     CLONE ir<[[L]]> = load ir<%src>
+; CHECK-NEXT:     REPLICATE ir<[[L]]> = load ir<%src>
 ; CHECK-NEXT:     EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%.pn> ir<[[L]]>
----------------
Do you know why this has changed? As far as I understand it, 'CLONE' literally means clone the scalar instruction once, whereas REPLICATE means insert copies into each lane of the vector. So in a sense we're now doing more loads than before I think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132892/new/

https://reviews.llvm.org/D132892