[PATCH] D82227: SLP: honor requested max vector size merging PHIs

Mon Jun 22 13:59:44 PDT 2020

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll:167-168
+; MAX128-NEXT:    [[I37:%.*]] = fadd float 0.000000e+00, [[I36]]
+; MAX128-NEXT:    [[I38:%.*]] = fmul float [[I3]], [[FVAL]]
+; MAX128-NEXT:    [[I39:%.*]] = fadd float 0.000000e+00, [[I38]]
+; MAX128-NEXT:    [[I40:%.*]] = fmul float [[I6]], [[FVAL]]
----------------
rampitec wrote:
> ABataev wrote:
> > It really makes the vectorization worse, in general. Most of these inserts/extracts will be transformed into the simple shuffles by the instcombiner. And if there is really a problem with target-specific limitations, it is better to adapt the cost model rather than introduce changes that may affect all targets. Maybe need to fix `TTI::getTypeLegalizationCost`?
> We certainly working on improving our cost model, although it will not help here. I have experimented and we really need to lie a lot to avoid vectorization for a case like this. Note though that 128 bit case was suppressed by the cost model of a generic processor which believes it is not profitable, so I have updated test to 256 bit.
> 
> The main issue is a PHI of a wide vector type, we do not need anything else to run into the problem, and BoUpSLP::getEntryCost() does not even check a cost of a PHI:
> 
> 
> ```
>   switch (ShuffleOrOp) {
>     case Instruction::PHI:
>       return 0;
> ```
> 
> That said, I also do not believe it can be correctly solved by a cost model. This is not a cost problem, this is a question of ability to correctly generate code. Cost model covers instruction size, throughput, and latency, but it does not cover register pressure.
> 
> If we believe there are targets out there which may benefit from an unlimitly wide vectorization I can expose yet another TTI callback. We have TTI::getRegisterBitWidth() and option -slp-max-reg-size, I can add TTI::getRealRegisterBitWidth() and -slp-real-max-reg-size. Alternatively targets believing in an unconditionally good wide vectors may return 1024 from getRegisterBitWidth(), right?
Yes, I thought about it, that may be a good alternative solution. Maybe, just return `0` and in this case do not check for size at all?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82227/new/

https://reviews.llvm.org/D82227