[PATCH] D110235: [LoopVectorize] Support reductions that store intermediary result

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 5 01:53:32 PST 2022


fhahn added a comment.

Thanks for the latest update! I left some more additional comments. It might be good to pre-commit the tests and only include the diff in the patch. Another thing to consider is moving the reduction-store tests to a separate file, as `reduction.ll` is already quite large.

One thing to note on the overall approach is that it's a bit of a workaround some current limitations in our modeling, but I don't think there's an alternative in the short term and this is clearly a desirable case to support, so that seems fine to me in general. There's work in progress to model the pre-header and exit block in VPlan, which allows us to do the sinking of the store more easily, so we can improve things further then :)



================
Comment at: llvm/include/llvm/Analysis/IVDescriptors.h:181
   /// computed.
-  static bool isReductionPHI(PHINode *Phi, Loop *TheLoop,
+  static bool isReductionPHI(PHINode *Phi, Loop *TheLoop, ScalarEvolution *SE,
                              RecurrenceDescriptor &RedDes,
----------------
>From an interface perspective, I think it would be better to make `SE` an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring `IntermediateStore`.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:914
 
+  if (!LAI->getStoresToInvariantAddresses().empty()) {
+    // For each invariant address, check its last stored value is unconditional.
----------------
It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9333
 
+      // Invariant stores will be either deleted or go outside of loop so there
+      // is no need to create a recipe for them.
----------------
`either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9335
+      // is no need to create a recipe for them.
+      StoreInst *SI;
+      if ((SI = dyn_cast<StoreInst>(&I)) &&
----------------
I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps `VPReductionRecipe` should print if the result is stored after the loop? Please add a test case to `vplan-printing.ll`


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:469
+; int sum = 0;
+; for(i=0..N) {
+;   sum += src[i];
----------------
FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal opinion. Better to strive to make the tests as readable/compact as possible and have a comment explaining what it tests when needed.


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:546
+  store i32 0, i32* %gep.dst, align 4
+  br label %for.body
+for.body:
----------------
nit: newline.


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:582
+  %cmp1 = icmp sgt i32 %n, 0
+  br i1 %cmp1, label %for.body.preheader, label %for.end
+
----------------
not needed?


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:585
+for.body.preheader:                               ; preds = %entry
+  %wide.trip.count = zext i32 %n to i64
+  br label %for.body
----------------
nit: is this needed? Can just pass `%n` as `i64`


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:628
+; CHECK: middle.block:
+; CHECK: store i32 %{{[0-9]+}}, i32* %gep.dst
+; CHECK: ret void
----------------
Here (and for the other tests it would be good to check at least the vector reduction sequence and that the correct value is stored.


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:672
+
+for.cond.cleanup:
+  ret void
----------------
move exit block to end of function?


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:711
+
+for.cond.cleanup:                                 ; preds = %latch
+  ret void
----------------
move exit block to end of function?



================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:753
+
+for.cond.cleanup:                                 ; preds = %latch
+  ret void
----------------
move exit block to end of function?



================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:803
+
+for.cond.cleanup:
+  %sum.lcssa = phi i32 [ %sum.1, %for.body ]
----------------
move exit block to end of function?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110235/new/

https://reviews.llvm.org/D110235



More information about the llvm-commits mailing list