[PATCH] D82227: SLP: honor requested max vector size merging PHIs

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 22 13:26:51 PDT 2020


rampitec marked an inline comment as done.
rampitec added inline comments.


================
Comment at: llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll:167-168
+; MAX128-NEXT:    [[I37:%.*]] = fadd float 0.000000e+00, [[I36]]
+; MAX128-NEXT:    [[I38:%.*]] = fmul float [[I3]], [[FVAL]]
+; MAX128-NEXT:    [[I39:%.*]] = fadd float 0.000000e+00, [[I38]]
+; MAX128-NEXT:    [[I40:%.*]] = fmul float [[I6]], [[FVAL]]
----------------
ABataev wrote:
> It really makes the vectorization worse, in general. Most of these inserts/extracts will be transformed into the simple shuffles by the instcombiner. And if there is really a problem with target-specific limitations, it is better to adapt the cost model rather than introduce changes that may affect all targets. Maybe need to fix `TTI::getTypeLegalizationCost`?
We certainly working on improving our cost model, although it will not help here. I have experimented and we really need to lie a lot to avoid vectorization for a case like this. Note though that 128 bit case was suppressed by the cost model of a generic processor which believes it is not profitable, so I have updated test to 256 bit.

The main issue is a PHI of a wide vector type, we do not need anything else to run into the problem, and BoUpSLP::getEntryCost() does not even check a cost of a PHI:


```
  switch (ShuffleOrOp) {
    case Instruction::PHI:
      return 0;
```

That said, I also do not believe it can be correctly solved by a cost model. This is not a cost problem, this is a question of ability to correctly generate code. Cost model covers instruction size, throughput, and latency, but it does not cover register pressure.

If we believe there are targets out there which may benefit from an unlimitly wide vectorization I can expose yet another TTI callback. We have TTI::getRegisterBitWidth() and option -slp-max-reg-size, I can add TTI::getRealRegisterBitWidth() and -slp-real-max-reg-size. Alternatively targets believing in an unconditionally good wide vectors may return 1024 from getRegisterBitWidth(), right?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82227/new/

https://reviews.llvm.org/D82227





More information about the llvm-commits mailing list