[PATCH] D49491: [RFC][VPlan, SLP] Add simple SLP analysis on top of VPlan.

Wed Jul 18 11:58:18 PDT 2018

rengolin added a comment.

My tuppence...

In https://reviews.llvm.org/D49491#1166927, @fhahn wrote:

> The initial motivation is to improve vectorization in cases where currently the loop vectorizer only considers interleaving memory accesses, which either results in sub-optimal code or no vectorization in case the interleaved accesses are too expensive on the target. It should also come in handy when vectorizing for architectures with scalable vector registers, where SLP-style vectorization of an unrolled loop is harder, for example.

This was part of the VPlan from the beginning, but we never discussed in detail how the implementation would work, especially considering that SLP will still be there for quite a while as is.

> Overall I think this should fit nicely into the emerging VPlan infrastructure: applying SLP style vectorization is one strategy/plan among others, that are evaluated against each other before choosing the most profitable over all. We can add the SLP analysis and tooling to create a VPlan with the SLP combinations applied, and just need small tweaks to VPlan-based cost modelling and code generation to make it aware of the new combined load/store instructions.

This is indeed the biggest benefit of having SLP analysis in VPlan, especially with the VPlan-to-VPlan transformations.

> In the long term, I think there is potential to share infrastructure with the SLPVectorizer on different levels: for example, we could potentially re-use VPlan based code generation, if the SLPVectorizer would emit VPInstructions; share the infrastructure to discover consecutive loads/stores; share different combination algorithms between VPlan and SLPVectorizer; potentially re-use part of the VPlan based cost model. One thing I want to try out is how easy/hard it would be to make the analysis work on VPInstruction and Instruction based basic blocks.

This is why we haven't gone too deep in the analysis, yet. Sharing code between SLPVec, which operates in IR, and VPlanSLP, which operates in VInstructions, can be confusing, limiting or even not possible. The analysis part mostly works, because VPInstruction is similar enough to Inst, but implementation and, worse, heuristics, can go wrong very quickly.

I think long term we basically have two options:

1. We cannibalise SLPVec, hoisting analyses, transformations and make it generic (Inst/VPInst) and make sure we never hurt performance or compile time. This is hard, slow and painful, but it's the most stable ongoing solution.

2. We implement VPlanSLP in parallel, create a flag to flip between (not have both at the same time), and when VPlanSLP is doing all and more, we flip by default. This is much easier short-term but risk never get truly flipped and divide the usage.

I don't have a good view on which would be better right now. VPlan is still largely monolithic, VP-to-VP is too fresh, and SLP needs to understand loop and outer loop boundaries to operate correctly in VPlan.

> That being said, some of the VPlan infrastructure is still emerging: initial VPInstruction based codegeneration and cost modelling is currently worked on for example. However I think considering SLP style vectorization as a VPlan2VPlan transformation (and others) early on would help to make sure the design of the VPlan infrastructure is general enough to cover a wide range of use cases.

I can see the appeal in a proof-of-concept, and I don't oppose having it. But I'm not strongly in favour either.

If more people think that option #2 above is the way to go, then this could turn out fine.

But if more people prefer the option #1, then we would want to see what gets hoisted and how will the local implementations look like before we add VPlanSLP.

cheers,
--renato

https://reviews.llvm.org/D49491