[PATCH] D82227: SLP: honor requested max vector size merging PHIs
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 22 13:59:44 PDT 2020
ABataev added inline comments.
================
Comment at: llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll:167-168
+; MAX128-NEXT: [[I37:%.*]] = fadd float 0.000000e+00, [[I36]]
+; MAX128-NEXT: [[I38:%.*]] = fmul float [[I3]], [[FVAL]]
+; MAX128-NEXT: [[I39:%.*]] = fadd float 0.000000e+00, [[I38]]
+; MAX128-NEXT: [[I40:%.*]] = fmul float [[I6]], [[FVAL]]
----------------
rampitec wrote:
> ABataev wrote:
> > It really makes the vectorization worse, in general. Most of these inserts/extracts will be transformed into the simple shuffles by the instcombiner. And if there is really a problem with target-specific limitations, it is better to adapt the cost model rather than introduce changes that may affect all targets. Maybe need to fix `TTI::getTypeLegalizationCost`?
> We certainly working on improving our cost model, although it will not help here. I have experimented and we really need to lie a lot to avoid vectorization for a case like this. Note though that 128 bit case was suppressed by the cost model of a generic processor which believes it is not profitable, so I have updated test to 256 bit.
>
> The main issue is a PHI of a wide vector type, we do not need anything else to run into the problem, and BoUpSLP::getEntryCost() does not even check a cost of a PHI:
>
>
> ```
> switch (ShuffleOrOp) {
> case Instruction::PHI:
> return 0;
> ```
>
> That said, I also do not believe it can be correctly solved by a cost model. This is not a cost problem, this is a question of ability to correctly generate code. Cost model covers instruction size, throughput, and latency, but it does not cover register pressure.
>
> If we believe there are targets out there which may benefit from an unlimitly wide vectorization I can expose yet another TTI callback. We have TTI::getRegisterBitWidth() and option -slp-max-reg-size, I can add TTI::getRealRegisterBitWidth() and -slp-real-max-reg-size. Alternatively targets believing in an unconditionally good wide vectors may return 1024 from getRegisterBitWidth(), right?
Yes, I thought about it, that may be a good alternative solution. Maybe, just return `0` and in this case do not check for size at all?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D82227/new/
https://reviews.llvm.org/D82227
More information about the llvm-commits
mailing list