[PATCH] D28975: [LV] Introducing VPlan to model the vectorized code and drive its transformation

Thu Feb 23 04:07:56 PST 2017

rengolin added a comment.

In https://reviews.llvm.org/D28975#684539, @jonpa wrote:

> The main thought I had at this moment, was that I thought perhaps if the scalarization costs were modeled in a better way, the LoopVectorizer should be able to for example hold the scalarization costs for each instruction as a tuple of {inserts, extracts}, and then get a more accurate final cost estimate sum by checking interdependencies of scalarized / vectorized instructions. It should only add inserts if the user was vectorized, and so on. I was hoping maybe VPlan perhaps might build a model with instruction costs and sum them after all individual costs are there, or so.

This was my first thought, since they are the highest unknowns in vectorisation. But then you may have extending adds but not extending saturating adds, and then a three-instruction-pattern (ext+add+max) cannot be matched, but it will match the (ext+add) and remove the ext cost, when there is actually an extra instruction for that.

There are also costs related to moving to and from vector registers. For example, on NEON, GPR->NEON is free, but NEON->GPR has a ~10-cycle stall. That cannot be modelled without understanding about the surrounding instructions (say, a scalar reduction on every loop would make a 2-lane vector add useless).

We could probably go slow and fiddle with the scalar costs now (with lost of benchmark results in the affected arches), and maybe have a half-baked solution for shuffles, since they're the most obvious problems, but it would be good to not destroy the plan to be able to look at the context, hopefully based on table gen pattern matches.

cheers,
--renato

https://reviews.llvm.org/D28975