[PATCH] D142015: [LV] Plan with and without FoldTailByMasking

Mon Jan 30 04:34:42 PST 2023

dmgreen added a comment.

In D142015#4066159 <https://reviews.llvm.org/D142015#4066159>, @SjoerdMeijer wrote:

> Thanks for working on this. General direction looks good to me, this is exactly what we want.
>
>> VectorizationScheme was used to describe the pair of (FoldTailByMasking, VF). Alternative suggestions welcome
>
> I was going to suggest PredicationScheme. But the VF is also part of it, so PredicationScheme does not covert it. Perhaps VectorizationScheme is just fine. :)

Thanks. There are still some MVE performance issues I need to work through, where it now picks a higher unpredicated trip count over tail predication, and a combination of it being inside a nested loop and low cheap counts make it unprofitable. There are some good improvements too, but that one is a little too large. I think with epilog vectorization and a few other tricks it can get the the point where it too is an improvement.

In D142015#4082011 <https://reviews.llvm.org/D142015#4082011>, @david-arm wrote:

> Hi @dmgreen, I'm sorry I've not looked into this in more detail, but from the description you gave (which is very detailed - thanks!) I do have one concern about choosing tail-folding by default in a tie. One major problem we have at the moment is that the active lane mask call is effectively free because, unless I'm mistaken, we don't add the cost of this intrinsic to the loop. A simple IV increment (an add instruction) is likely to be cheaper than the codegen for the loop predicate for many targets? However, I do appreciate that targets that can't efficiently generate loop predicates probably also can't do masked loads and stores, so the tail-folded loop cost is likely to be high anyway. I wonder if it's better to be conservative and choose the non-tail-folded version in a tie until we've got a fairer comparison between different vectorisation styles?

Yep - that's hopefully step 3. Once we have this and epilog vectorization (which should be a fairly simple addition) we can then have the option on SVE of choosing between unpredicated body + predicated remainder vs a full predicated body, depending on the target. Targets where the while instructions are not a bottleneck can get the benefits of full predication. I'm not sure yet whether it will just be a hard limit or an added cost. That's the idea at least, there are plenty of details to sort out first and it we will have to see how it performs in practice.

In D142015#4084191 <https://reviews.llvm.org/D142015#4084191>, @fhahn wrote:

> I think the overall direction is very desirable! However I am not sure if adding a FoldTail argument to the cost functions is an ideal way to get there as it requires a large number of changes and is not really scalable (e.g. adding support to generate a 3rd variant of plans may mean adding another argument to all functions)
>
> I am not sure if I am missing something, but would it be possible to instead run the planner multiple times with different cost-model configurations? I tried to sketch an option for more general support for that in D142669 <https://reviews.llvm.org/D142669>. Then we could generate the plans for tail-folding by just instantiating another instance of the cost model, sketched in D142670 <https://reviews.llvm.org/D142670>. Maybe that could help to simplify the patch?

Multiple cost models was something I considered, but dismissed fairly quickly. I didn't think it would be something that anyone would agree to in a review - that we invent a new hierarchy of multiple cost-models each containing multiple vplans. The vplans should be flatter than that, and there is more in the cost-model that isn't dependent on FoldTail than is. I think we should be trying to reduce the number of cost-models, not increase it! The same argument about new variants could equally apply, just to an explosion in the number of cost-models needed.

This patch only pushed the FoldTailByMasking to the places that need it, just treating FoldTailByMasking like the VF already is. How married are you to the idea of multiple cost models? (Or how anti this are you?) I know this patch is fairly large but it feels like a cleaner end result once it is done. Let me know and I can try and get everything working the other way if necessary.

> Another thing I noticed that the patch changes skeleton creation to take a VPlan as argument. I think it would be preferable to do this the other way around if possible, so instead model the difference between plans with and without tail-folding explicitly in the pre-header blocks in the VPlan. I've not looked at the details here to see how difficult that would be yet unfortunately though. But more than happy to iterate on that aspect together!

I was hoping for one step at a time. Firstly we can break the dependency on specifying FoldTailByMasking early in the costmodel, whilst keeping then number of other changes needed minimal. Bigger changes can come later, but might require more changes. (It is useful, for example, to know that a plan is FoldTailByMasking when picking epilog plans).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142015/new/

https://reviews.llvm.org/D142015