[PATCH] D142894: [LoopVectorize] Use overflow-check analysis to improve tail-folding.

David Sherwood via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 14 09:29:58 PST 2023


david-arm added a comment.

Nice! I had a few minor comments, except for a possible issue with the use of `getSmallBestKnownTC`.



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1560
 
+  bool isIndvarOverflowCheckKnownFalse(
+      ElementCount VF, std::optional<unsigned> UF = std::nullopt) const;
----------------
Can you add some `///` comments here please?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3040
+  bool IVUpdateCannotOverflow = Cost->isIndvarOverflowCheckKnownFalse(VF, UF);
+  TailFoldingStyle Style = Cost->getTailFoldingStyle(IVUpdateCannotOverflow);
   if (Style == TailFoldingStyle::None)
----------------
This is just a thought, but I think you can probably simplify this to just:

  TailFoldingStyle Style = Cost->getTailFoldingStyle();
  if (Style == TailFoldingStyle::None)
    CheckMinIters =
        Builder.CreateICmp(P, Count, CreateStep(), "min.iters.check");
  else if (VF.isScalable() &&
             Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck &&
             !Cost->isIndvarOverflowCheckKnownFalse(VF, UF)) {
    ...

That way you only need to call `isIndvarOverflowCheckKnownFalse` just before you're about to actually create the checks. What do you think?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3497
+  // vector loop induction variable.
+  if (std::optional<unsigned> TC = getSmallBestKnownTC(*PSE.getSE(), TheLoop)) {
+    unsigned MaxVFElts = VF.getKnownMinValue();
----------------
I wonder if this is safe because `getSmallBestKnownTC` also returns a value from profiling for the expected trip count, which is more like a hint rather than a definite?


================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-overflow-checks.ll:45
+;
+entry:
+  %cmp6.not = icmp eq i32 %N, 0
----------------
nit: Maybe the entry blocks in both tests can be removed to simplify the IR and CHECK lines? I think the only thing that matters here is the `zext` in the `for.body.preheader`, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142894/new/

https://reviews.llvm.org/D142894



More information about the llvm-commits mailing list