[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.

Evgeniy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 7 04:32:47 PST 2022


ebrevnov added a comment.

In D115713#3910792 <https://reviews.llvm.org/D115713#3910792>, @dmgreen wrote:

> I've been taking a look at the example that is getting a lot worse. There are certainly some issues with the code generation being non-optimal, but even after a lot of optimizations it looks like it will always be worse than the scalar version. There is a lot of predications and fairly efficient scalar instructions like BFI, which makes accurate cost modelling difficult.

I would like to understand you case better. Let's take current non optimality of code generation out of the consideration. Is it a problem of inaccurate cost estimation for some particular instructions or not taking overhead which comes with the vectorization into account?

> There's a lot of setup and spilling too, which is going to hurt for small trip counts.

Does this setup/spilling happens inside the main vector loop or outside? Is this reflected on IR level or low level (maybe even hardware specific)?

> I think for MVE it would make sense to have a way for the target to put a limit on the minumum trip count.

IMHO, this may be an option only if we find out that this is something specific to MVE. Anyway, it looks like a workaround (hopefully temporary :-)) until a better support is there.

> I think @fhahn also mentioned that he had some AArch64 examples where the same is true. I'm not sure in general where this would be useful. The vectorizers handing of small trip count loops is not amazing, considering that many such loops will already have been fully unrolled.

Until I'm missing something looks like currently LV comes before Unroller&SLP which makes perfect sense to me. Anyway, I wouldn't stick to any specific pass ordering as LLVM is an infrastructure for building custom compilers. LV should do its job as good as it can. If it can prove that vectorization is beneficial it should do it (until we have an infrastructure to take dependencies between different passes into account).

> It doesn't come up a huge amount and a lot of the cost modelling currently assumes any extra setup costs will be dominated by the loop.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115713/new/

https://reviews.llvm.org/D115713



More information about the llvm-commits mailing list