[PATCH] D32533: [SLPVectorizer] Limit the number of block chain instructions to max register size

Wed Apr 26 10:38:00 PDT 2017

mkuper added a comment.

In https://reviews.llvm.org/D32533#738239, @anna wrote:

> okay, that explains why we do limit to the target vector register *only* for the store chain in SLP (compile time benefit). However, if we look at the Loop Vectorizer, we consider the maximum vector register size when generating the code in the IR. This also gives a more accurate cost model for LV.
>
> Not considering the physical vector register size is limiting the SLP cost model right? For example, in the target, we would have 4 shuffles instead of a single shuffle.

SLP and LV, unfortunately, have different approaches here.

SLP, except for the store chain case, ignores register sizes. The assumption is that (a) the legalizer will do a good job, and (b) the cost model accurately reflects legalization costs. LV is more conservative, and will not create vectors wider than the register size (it has a flag that enables it to do so, but it's off by default). The direction we want to move in is of *not* limiting the vector size in either SLP or LV, that is, the opposite of what this patch does. This is important for vectorizing code that mixes types of different sizes.  That hasn't happened yet, for a couple of reasons. One is that the assumptions SLP makes about the cost model and legalization don't necessarily hold. :-) The other is that doing it correctly also requires modeling register pressure. We already do it in LV for the interleaving (unrolling) factor, to an extent, but it needs to get integrated with the vectorization factor heuristic as well.

So, overall, I'd say the right solution here is not to stop SLP from creating wide vectors, but to fix the backend/cost model issues.

https://reviews.llvm.org/D32533