[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Fri Jan 5 15:38:25 PST 2018

> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> 
> All,
> 
> I'm trying to refactor LoopVectorize such that it has better conformance to VPlan vision going forward
> (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All VP*Recipe class definitions are now
> moved to VPlan.h, and I have a patch under review to move LoopVectorizationPlanner class out of
> LoopVectorize.cpp (https://reviews.llvm.org/D41420).
> 
> Next thing I'm working on is LoopVectorizationLegality, and I noticed that it has a component of
> CostModel and optimization, which doesn't seem right from vectorizer's architectural perspective.
> It appears that we are currently abusing Legal as the attic to throw a lot of things in order to avoid passing
> many pointers around. From vectorizer's architectural point of view, we should distinguish Legal from
> "Vectorization Context Information" (I'd call it LoopVectorizationAnalysisInfo), some of which (such as
> induction, reduction, etc.) are populated during the Legal step. InterleaveInfo shouldn't even be
> a member of Legal. Nothing to do with Legality. It would be a good member of LoopVectorizationAnalysisInfo.
> Eventually, I'd like to see these under Analysis subtree (instead of Transform), since they are indeed Analysis.
> 
> As a first step of this LoopVectorizationLegality cleanup, I propose to move the following checks
> (and member functions) to LoopVectorizationCostModel.
> 	isLegalMaskedStore
> 	isLegalMaskedLoad
> 	isLegalMaskedScatter
> 	isLegalMaskedGather
> My assumption is that all SIMD architectures should support serialization of those operations
> at some cost  (e.g., lowering in CG prepare) and thus failing to vectorize due to "false" return values
> of those calls is incorrect behavior. I'll make sure to use a very high initial cost such that this cleanup is
> NFC for all practical purposes ---- and leave the tuning work as TODO.
> 
> The down side I can think of is that this will end up running more parts of vectorizer for those kind
> of loops ---- can expose pre-existing bugs and compile time would be a bit longer since we are bailing
> out later. Upside is that we can tune the cost model ---- if other parts of the loop has enough 
> speedup, we don't have to give up entire vectorization simply because masked load/store/gather/scatter
> aren't supported on the target.
> 
> If anyone still thinks "early bailout" is valuable, splitting into a separate HWLegal class would be
> a cleaner approach than what we have today. We should be able to disable/enable it under an option.
> 
> Let me know what you think.
> 
> Thanks,
> Hideki Saito
> 
> 
I support this direction, and agree in principal that these are strictly cost model queries, but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early bailouts in some form.’

Amara