[llvm-dev] [RFC] Goals for VPlan-based cost modelling

Fri Oct 30 06:37:36 PDT 2020

Hi Anna,

This sounds like a great work for a master!

The main difference with VPlan (versus the original vectoriser) is that we
can have multiple paths in parallel and compare them to each other via some
cost function. The problem, as you have stated, is that the cost model is
really naive and has a lot of heuristics that were fudged to suit the old
vectoriser. Transitioning to the new model will likely produce a cost model
that has nothing to do with the previous one. At all.

I see the problem from two angles. Both need to be thought about in
parallel, as their designs will need to be in sync.

1. How to build the cost model

The cost model structure needs to be target independent but with heavy
target-specific knowledge. The kind of specialisation we created for the
previous vectoriser is too naive, and assumes the concepts are the same,
just not necessarily the costs. That's not true at all for different
hardware.

Another big flaw is that the cost is per instruction, not per pattern. So
casts have zero cost because we expect the targets to optimise them away,
but in some cases they don't, so the heuristics is an average of how
efficient the back-end is at getting rid of those. When you're choosing
between two VPlan paths, this will generate noise beyond usefulness.

So we need to find the cost of instructions that are actually generated and
not those that aren't. One idea is to ignore most instructions and focus on
key ones (memory, arithmetic, etc) and then follow the use-def chain to see
the patterns, and then, *per target*, know what the costs are for the
pattern, not each individual instruction. Different targets can have
different key instructions, so the walk through the basic-block can be
target-dependent (ex: `for (auto inst: TTI->AllKeyInsts(bb))`). Once you
try to take the cost of a key instruction, it will follow all of the others
that you have ignored, to see if they would make a difference in the
pattern, stopping short at other key instructions, for example, to avoid
counting twice the same cost.

There are probably many other ways to do this, I'm just thinking out loud
to give an idea of the problems we faced earlier. I'm sure you'll find
better ways to do this yourself once you start looking at it more.

Regardless of how we do it, having the ability to say "this shuffle is free
on this instruction" but "not with that other one" will make the cost model
a lot more precise and need a lot less fudged heuristics.

2. How to use the cost model

Once we have costs of a VPlan, we need to traverse the problem space
efficiently. It would be awesome if we could use some smart traversing
(like RLO) but that also brings highly uncertain execution times and the
need for training and genralisation, so not generally feasible. But we can
do something simpler while still having the same idea. What we really don't
want is to be as greedy as the original vectoriser. We must have a way to
occasionally increase costs without giving up, but for that, we need a
policy that tells us that it's ok to go that way and not some other (less
optimal) way.

This policy has to be target-specific. Given it's more deterministic nature
(more than RLO at least), this will probably be the place where we fudge
most of our heuristics. Things like "keep adding shuffles that it's likely
they'll disappear", even if they don't all the time, it's worth pushing a
bit more, in case they do. So we'll also need to have hard limits, possibly
per target, and possibly benchmark-driven heuristics.

Once we have the policy and the cost, we can traverse one or many paths
(depending on the budget we give the compiler, which could be a command
line option), and then push as hard as we can through all paths within the
budget and when we stop, we take the lowest cost and apply the series. How
we traverse this will depend on the implementation of the cost model and
the policy, but some simple heuristics search would be fine as a first
approach.

Folks working on VPlan are creating a really nice structure to work with
the different concepts in vectorisation strategies (predication, scalable
vectors, use-def chain costs), so I'm sure you'll have at least a few good
ways to proceed, and hopefully help define the interfaces between the
different parts of the vectoriser.

cheers,
--renato

On Fri, 30 Oct 2020 at 10:08, Anna Sophia Welker via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> I am looking into the benefits of a VPlan-based cost model, and into how
> such a cost model should be implemented to make the most out of these
> benefits.
>
> Over the last year I have been working with LLVM, mainly focused on the
> ARM backend, in the course of a one-year internship at Arm Ltd. My main
> project from December 2019 to August 2020 was to introduce gather/scatters
> for MVE auto-vectorization. One of the recurring challenges during this
> work was to get things right in the cost model.
> For example, gathers can extend the data they load, while scatters can
> truncate their input, meaning that an extend following the load, or a
> truncate preceding the store, is for free if it meets certain type
> conditions. As the current cost model is not designed for context-based
> analysis, this was a pain to model.
>
> I have done some research and found out that one of the proposed benefits
> of VPlan is that a new cost model making use of it would be able to better
> support context-dependent decisions like this.
> However, there does not exist much specification about what such a cost
> model should look like.
>
> Also, I have read through the respective code to understand how loop
> vectorization is currently done and how far the introduction of VPlan has
> progressed and have realised that the only recipe that actually handles
> more than one instruction from the input IR is the one for interleaved
> groups. When the VPlan is generated on the VPlan-native path, every IR
> instruction is considered and transformed into a recipe separately,
> ignoring its context (to give a code location, I am looking at
> VPlanTransforms::VPInstructionsToVPRecipes).
> And maybe there are architectures that for some cases do not have the same
> vector instructions, so a pattern that works great for one could be useless
> for others. So I am wondering: Is there any plan to have target-dependent
> flavours of recipes, or how will those things be handled?
> Right now it makes sense that nothing like this has been implemented yet,
> as there is no cost model that could guide the transformation. But if
> recipes are a general thing, should the cost model be the component
> actually providing the target-specific pattern for a recipe, together with
> its cost?
>
> I am considering choosing a topic related to VPlan, possibly cost
> modelling, for my Master thesis, with the goal to present a solution and
> implement a prototype.
>
> What I would like to ask the community is
>
> (1) what goals there are or should be for VPlan-based cost modelling,
> (2) whether there have been any discussions on target-dependent patterns
> yet, and
> (3) which examples of inefficiencies and shortcomings of the current cost
> model they have come across.
>
> I am looking forward to your feedback.
>
> Many thanks,
> Anna Welker
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201030/1f25ee15/attachment.html>