[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.

Thu Nov 3 01:09:44 PDT 2022

ebrevnov added a comment.

In D115713#3898587 <https://reviews.llvm.org/D115713#3898587>, @dmgreen wrote:

> I tried running the original benchmarks again with this patch series, it still sees the same decrease in performance I'm afraid. If it was a small change it would be understandable, we accidentally end up on the wrong side of the scalar cost and choose to vectorize where we shouldn't. But the new code is 40% slower than the scalar version, so it's quite a difference.  I haven't had a chance to look into the costs it produces, there is a chance we are underestimating the cost of predication or overestimating the cost of scalar. At worst, providing getMinTripCountTailFoldingThreshold works like it should we can always put a limit on the tripcount, providing we can find a decent minimum that works in general for MVE.

It sounds like current cost model works really bad for this case and it's essentially the reason why getMinTripCountTailFoldingThreshold had been introduced. If this is the case would you mind trying updated version and see if it helps. Otherwise, I'd probably need a test case I could analyze.

> The vectorizer doesn't really do any cost modelling for the setup cost, past some obvious things like runtime checks. Due the the pass pipeline, it is usually expected that many low-trip-count loops are unrolled prior to vectorization, and we SLP them instead. D135971 <https://reviews.llvm.org/D135971> looks like it changes a lot of code in an area that has caused an amount of trouble in the past. We know that the Vectorizer current needs to commit to tail folding vs not tail folding too early. It would be better to have that option as part of the vplan, allowing predicated and non-predicated patters to be costed against one another. I'm not sure I follow D135971 <https://reviews.llvm.org/D135971> enough to understand the ramifications of those changes.

I understand the problem you describe here. D135971 <https://reviews.llvm.org/D135971> has nothing to do with it. It's an NFC. None of tests change their behavior. Main motivation is to simplify unnecessary complicated implementation because I struggled how to push new functionality in it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115713/new/

https://reviews.llvm.org/D115713