[PATCH] D53865: [LoopVectorizer] Improve computation of scalarization overhead.

Wed Dec 12 02:02:29 PST 2018

jonpa added a comment.

Thank you for your feedback!

> I think you are either 1) arm-twisting the vectorizer to emit vector code which you know will be scalar or 2) arm-twisting vectorizer's cost model to believe what you are emitting as "vector" to be really scalar. I certainly do not see the reason why "you have to" do that, because letting vectorizer emit scalar IR instructions in those cases should be "equivalent". So, why "do you WANT to" do that? IR going out of vectorizer may be more compact, but what that'll accomplish is cheating all downstream optimizers and their cost models.

I am just trying to keep it simple by not changing how LV generates code, but merely improve the cost computations. Changing the output of a vectorized loop seems like a much bigger project, which I did not attempt.

> So there are basically two possible ways to model sequences like this:
> 
> 1. The vectorizer models/emits the instructions as "vector" instructions, but gives a discount to back-to-back instructions which will be scalarized.
> 2. The vectorizer compares vector and scalar instructions based on the cost model, and explicitly scalarizes instruction sequences if they would be cheaper.

I can see the benefit of (2) in cases where an instruction is vectorizable but in-between two scalarized instructions, like: Scal -> Vec -> Scal.  In such a case it may be better to also scalarize the vectorizable (Vec) instruction.

Are you proposing some kind of search over instruction sequences with some limited lookahead? Like comparing the costs of scalarizing I1, I2, I3 compared to vectorizing? So {Extr + Scal1 + Scal2 + Scal3 + Ins} is compared to {Vec1 + Vec2 + Vec3}, where a target expanded vector instruction would automatically (as it currently does) include the scalarization cost...

I suspect that an algorithm that makes these decisions would benefit from knowing which instructions the target *must* scalarize (no vector instruction available). I think that would be the starting points for these searches, or? After all, if all instructions are vectorizable, this step could simply be skipped. In that sense my patch might be a first step in this direction, or?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53865/new/

https://reviews.llvm.org/D53865