<div dir="ltr">Hi Anna,<div><br></div><div>This sounds like a great work for a master!</div><div><br></div><div>The main difference with VPlan (versus the original vectoriser) is that we can have multiple paths in parallel and compare them to each other via some cost function. The problem, as you have stated, is that the cost model is really naive and has a lot of heuristics that were fudged to suit the old vectoriser. Transitioning to the new model will likely produce a cost model that has nothing to do with the previous one. At all.</div><div><br></div><div>I see the problem from two angles. Both need to be thought about in parallel, as their designs will need to be in sync.</div><div><br></div><div>1. How to build the cost model</div><div><br></div><div>The cost model structure needs to be target independent but with heavy target-specific knowledge. The kind of specialisation we created for the previous vectoriser is too naive, and assumes the concepts are the same, just not necessarily the costs. That's not true at all for different hardware.</div><div><br></div><div>Another big flaw is that the cost is per instruction, not per pattern. So casts have zero cost because we expect the targets to optimise them away, but in some cases they don't, so the heuristics is an average of how efficient the back-end is at getting rid of those. When you're choosing between two VPlan paths, this will generate noise beyond usefulness.</div><div><br></div><div>So we need to find the cost of instructions that are actually generated and not those that aren't. One idea is to ignore most instructions and focus on key ones (memory, arithmetic, etc) and then follow the use-def chain to see the patterns, and then, *per target*, know what the costs are for the pattern, not each individual instruction. Different targets can have different key instructions, so the walk through the basic-block can be target-dependent (ex: `for (auto inst: TTI->AllKeyInsts(bb))`). Once you try to take the cost of a key instruction, it will follow all of the others that you have ignored, to see if they would make a difference in the pattern, stopping short at other key instructions, for example, to avoid counting twice the same cost.</div><div><br></div><div>There are probably many other ways to do this, I'm just thinking out loud to give an idea of the problems we faced earlier. I'm sure you'll find better ways to do this yourself once you start looking at it more.</div><div><br></div><div>Regardless of how we do it, having the ability to say "this shuffle is free on this instruction" but "not with that other one" will make the cost model a lot more precise and need a lot less fudged heuristics.</div><div><br></div><div>2. How to use the cost model</div><div><br></div><div>Once we have costs of a VPlan, we need to traverse the problem space efficiently. It would be awesome if we could use some smart traversing (like RLO) but that also brings highly uncertain execution times and the need for training and genralisation, so not generally feasible. But we can do something simpler while still having the same idea. What we really don't want is to be as greedy as the original vectoriser. We must have a way to occasionally increase costs without giving up, but for that, we need a policy that tells us that it's ok to go that way and not some other (less optimal) way.</div><div><br></div><div>This policy has to be target-specific. Given it's more deterministic nature (more than RLO at least), this will probably be the place where we fudge most of our heuristics. Things like "keep adding shuffles that it's likely they'll disappear", even if they don't all the time, it's worth pushing a bit more, in case they do. So we'll also need to have hard limits, possibly per target, and possibly benchmark-driven heuristics.</div><div><br></div><div>Once we have the policy and the cost, we can traverse one or many paths (depending on the budget we give the compiler, which could be a command line option), and then push as hard as we can through all paths within the budget and when we stop, we take the lowest cost and apply the series. How we traverse this will depend on the implementation of the cost model and the policy, but some simple heuristics search would be fine as a first approach.</div><div><br></div><div>Folks working on VPlan are creating a really nice structure to work with the different concepts in vectorisation strategies (predication, scalable vectors, use-def chain costs), so I'm sure you'll have at least a few good ways to proceed, and hopefully help define the interfaces between the different parts of the vectoriser.</div><div><br></div><div>cheers,</div><div>--renato</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 30 Oct 2020 at 10:08, Anna Sophia Welker via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <div style="font-family:-moz-fixed;font-size:12px" lang="x-unicode">Hi all,

      <br>

      <br>

      I am looking into the benefits of a VPlan-based cost model, and

      into how such a cost model should be implemented to make the most

      out of these benefits.

      <br>

      <br>

      Over the last year I have been working with LLVM, mainly focused

      on the ARM backend, in the course of a one-year internship at Arm

      Ltd. My main project from December 2019 to August 2020 was to

      introduce gather/scatters for MVE auto-vectorization. One of the

      recurring challenges during this work was to get things right in

      the cost model.

      <br>

      For example, gathers can extend the data they load, while scatters

      can truncate their input, meaning that an extend following the

      load, or a truncate preceding the store, is for free if it meets

      certain type conditions. As the current cost model is not designed

      for context-based analysis, this was a pain to model.

      <br>

      <br>

      I have done some research and found out that one of the proposed

      benefits of VPlan is that a new cost model making use of it would

      be able to better support context-dependent decisions like this.

      <br>

      However, there does not exist much specification about what such a

      cost model should look like.

      <br>

      <br>

      Also, I have read through the respective code to understand how

      loop vectorization is currently done and how far the introduction

      of VPlan has progressed and have realised that the only recipe

      that actually handles more than one instruction from the input IR

      is the one for interleaved groups. When the VPlan is generated on

      the VPlan-native path, every IR instruction is considered and

      transformed into a recipe separately, ignoring its context (to

      give a code location, I am looking at

      VPlanTransforms::VPInstructionsToVPRecipes).

      <br>

      And maybe there are architectures that for some cases do not have

      the same vector instructions, so a pattern that works great for

      one could be useless for others. So I am wondering: Is there any

      plan to have target-dependent flavours of recipes, or how will

      those things be handled?

      <br>

      Right now it makes sense that nothing like this has been

      implemented yet, as there is no cost model that could guide the

      transformation. But if recipes are a general thing, should the

      cost model be the component actually providing the target-specific

      pattern for a recipe, together with its cost?

      <br>

      <br>

      I am considering choosing a topic related to VPlan, possibly cost

      modelling, for my Master thesis, with the goal to present a

      solution and implement a prototype.

      <br>

      <br>

      What I would like to ask the community is

      <br>

      <br>

      (1) what goals there are or should be for VPlan-based cost

      modelling,

      <br>

      (2) whether there have been any discussions on target-dependent

      patterns yet, and

      <br>

      (3) which examples of inefficiencies and shortcomings of the

      current cost model they have come across.

      <br>

      <br>

      I am looking forward to your feedback.

      <br>

      <br>

      Many thanks,

      <br>

      Anna Welker

      <br>

      <br>

    </div>

  </div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>