[PATCH] D67905: [LV] Vectorizer should adjust trip count in profile information

Evgeniy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 21 05:02:37 PST 2019


ebrevnov marked an inline comment as done.
ebrevnov added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3989
+                                           OrigFallThroughCount) ||
+      OrigFallThroughCount == 0)
+    return;
----------------
Ayal wrote:
> OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count, according to the code below.  Wait to test if its zero until we know what it stands for?
Good catch. Thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4001
+  if (!IsTrueBackEdgeOrigLoop)
+    std::swap(OrigTakenCount, OrigFallThroughCount);
+
----------------
Ayal wrote:
> Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to call them Taken and FallThrough. Perhaps use Weight instead of Count, to denote total profile frequencies, as the latter is used elsewhere to denote the actual per-invocation TripCount.
I fixed names. But I don't see reasons to use different variables here (if this is what you meant)


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4004
+  // Calculate number of iterations in original scalar loop.
+  // Note: Uses of OrigIterCount bellow should not be simplified as it will
+  // produce a different value. In other words: (A mod N) * B != (A*B) mod N
----------------
Ayal wrote:
> bel[l]ow
fixed


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4006
+  // produce a different value. In other words: (A mod N) * B != (A*B) mod N
+  const uint64_t OrigIterCount = OrigTakenCount / OrigFallThroughCount + 1;
+  // Calculate number of iterations in vector loop.
----------------
Ayal wrote:
> How about "OrigAverageTripCount"?
> 
> Explanation about its computation:
> OrigAverageTripCount = (number of times header block was executed) / (number of times header was reached from pre-header == number of times latch exited)
>  == (OrigTakenCount + OrigFallThroughCount) / OrigFallThroughCount
>  == OrigTakenCount / OrigFallThroughCount + 1.
Ok. Turned your explanation to a comment.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4016
+    PEIterCount = 0;
+  }
+
----------------
Ayal wrote:
> There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VF*UF for each invocation of the loop, and hence the average is also strictly positive FWIW. But best keep the approximation general instead of trying to improve it, given general lack of information.
Agree. Let me remove this special case then.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4023
+    VecTakenCount = (VecIterCount - 1) * OrigFallThroughCount;
+    VecFallThrough = OrigFallThroughCount;
+  }
----------------
Ayal wrote:
> This assumes the number of times the vector loop will be reached is equal to the number of times the original scalar loop was reached (OrigFallThrougCount). This holds is Cost->foldTailByMasking(), but otherwise invocations whose trip count < VF*UF will bypass the vector loop (and also == VF*UF if requireScalarEpilogue()), plus other run time guards.
Please note that VecFallThrough  is zero initially and set to OrigFallThrougCount only if vector loop is expected to be executed (VecIterCount > 0)


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4032
+    PEFallThroughCount = OrigFallThroughCount;
+  }
+
----------------
Ayal wrote:
> Similar to above comment, invocations whose trip count divides VF*UF will bypass the scalar remainder loop (w/o foldTailByMasking nor requireScalarEpilogue), so in general PEFallThroughCount <= OrigFallThroughCount.
Same explanation as for the above.


================
Comment at: llvm/test/Transforms/LoopVectorize/check-prof-info.ll:3
+; RUN: opt  -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 -S < %s |  FileCheck %s
+; RUN: opt  -passes="print<block-freq>,loop-vectorize" -prefer-predicate-over-epilog=true -force-vector-width=4 -force-vector-interleave=1 -S < %s |  FileCheck %s -check-prefix=CHECK-MASKED
+
----------------
Ayal wrote:
> May want to also check with UF>1.
I replaced masked case since we don't do anything special for it now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67905/new/

https://reviews.llvm.org/D67905





More information about the llvm-commits mailing list