[PATCH] D124284: [SLP]Try partial store vectorization if supported by target.

Tue May 3 11:19:22 PDT 2022

ABataev added a comment.

In D124284#3485220 <https://reviews.llvm.org/D124284#3485220>, @dmgreen wrote:

>> I tried something similar already, it won't work. Plus, trunc store is not the case we're looking at here, it is different. This function just says to vectorizer that it might be worth trying this vector factor. The cost model should later inform that it is not profitable. If something is not correct in the TTI, it should be fixed in TTI.
>
> Oh OK, that's a shame. There may be something a little off with the f16 costmodel, it is not always perfect, but I don't see anything obvious from what it is printing. There are only v2f16 values, which can't go too wrong.
> The issue isn't that SLP vectorization is worse than scalar, it's that in that particular case runtime unrolling is better. The SLP that happens can get in the way of something more profitable, and no pass in llvm operates in a vacuum.
>
> That issue isn't too important though. My worry is that this is currently an expensive way of saying "return 2". I've no strong objection if you want to go with the current method, but perhaps the default should be more "correct" and we can override the targets that want something different/more aggressive? They can choose to spend the extra compile time on factors that might not be expected to be very profitable to other archs.

In D124284#3489002 <https://reviews.llvm.org/D124284#3489002>, @asbirlea wrote:

> I'm seeing some fairly big regressions with this patch, specifically on Rome (AMD) architecture. 
> A couple of examples that are public in the test suite: SingleSource/Benchmarks/Shootout: for sieve I'm seeing a 20% performance regression in an opt build and an xfdo one, and for MicroBenchmarks/ImageProcessing/Dither 10% regression (opt, thinlto and xfdo).
> I'm seeing also a couple on Skylake, opt build, in the range of 5-13 %, an example being eigen with 13% regression; this may be harder to track down as it's in a specific configuration but let me know if you want to reproduce this one.
>
> As far as performance improvements, I see a few on Skylake in the range of 3-6%. An example here is MicroBenchmarks/ImageProcessing/Blur, which ranges between 4-5% improvement.
>
> Overall, the regressions outnumber the gains in the testing I've done so far and would likely block our compiler release.

I'm trying to improve it. But if we have perf regressions, there is something wrong with the cost model.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124284/new/

https://reviews.llvm.org/D124284