<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 31, 2021 at 9:01 AM Sameer Sahasrabuddhe <<a href="mailto:sameer@sbuddhe.net" target="_blank">sameer@sbuddhe.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

David Blaikie writes:<br>

<br>

>> I am currently involved in the activity towards decoupling AMDGPU's<br>

>> dependency on the abstractions being discussed here, and the one<br>

>> thing that immediately jumps out is the ability to traverse the<br>

>> predecessors and successors of a basic block. From all the emails so<br>

>> far and the resulting proposal, it seems that both David and Nicolai<br>

>> think that having a common non-abstract base class for basic blocks<br>

>> is a good idea.<br>

><br>

> I don't /think/ that's what I was advocating for (but it's been a<br>

> while, and I might be misremembering - if you've got pointers to<br>

> particular parts of the conversation to refresh my memory/frame this<br>

> discussion, that'd be handy) - I'd worry about the cost of trying to<br>

> introduce a common base class between these different APIs. I think<br>

> what I was advocating for was for them to have syntactically matching<br>

> APIs so that templated code could abstract over them without the need<br>

> for traits/adapters.<br>

<br>

You're right. Turns out that you didn't specifically advocate what I<br>

said above. The following is what was actually said, but I misunderstood<br>

something else from the thread:<br>

<br>

<a href="https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html</a><br>

<br>

  "I meant for a different direction - if we got buy-in for making LLVM IR and<br>

   MIR implement the same C++ concept (& wrote down what that concept was -<br>

   which APIs, how they look textually) then I think it would be relatively<br>

   easy to get the small/mechanical renaming patches reviewed. Incremental<br>

   from day 1, not big new abstraction, then incremental."<br>

<br>

>> So if we create a common non-abstract base class for basic blocks,<br>

>> then MLIR and LLVM IR will incur the overhead of two vectors, one<br>

>> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%<br>

>> increase in the size LLVM IR in some typical programs:<br>

>><br>

>> <a href="https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing" rel="noreferrer" target="_blank">https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing</a><br>

><br>

> That's overhead in the bitcode file size, yes?<br>

<br>

It's the memory overhead of storing those pointers in a vector. I don't<br>

think it needs to affect the bitcode, since it's just redundant<br>

information that can be populated after reading the whole function.<br>

<br>

>> Is this overhead the main issue in deciding whether we should<br>

>> introduce the common base class for traversing a CFG?<br>

><br>

> I expect that would be a pretty significant concern, but also memory<br>

> usage and any time overhead keeping these new data structures up to<br>

> date, or slowing down compilation due to memory paging in and out,<br>

> etc.<br>

<br>

Right. If the discussion progressed to those details, then we will be in<br>

a good place :)<br>

<br>

Everyone seems to agree that we need an abstraction that makes it easier<br>

to write analyses, and Nicolai's document captures all the options. But<br>

in particular the document makes a case against templates, and instead<br>

advocates a way to write analyses without referring to the actual types<br>

in the underlying IR. This is achieved with a combination of opaque<br>

handles and dynamic polymorphism. The document also strongly recommends<br>

minimizing the dynamic polymorphism by designing "common non-abstract<br>

base classes for SSA basic block and instructions concepts".<br>

<br>

My focus right now is on the base class for basic blocks. It represents<br>

a frequently used property of the CFG, and the problem is concise, and<br>

already familiar to anyone writing an analysis. Our current experience<br>

with writing new analyses in AMDGPU provides a motivation for moving<br>

away from templates. But this is only a specific part of the bigger<br>

problem. In particular, it only represents the "CFG" part of the<br>

problem, but not the "SSA IR" part.<br>

<br>

The actual use-case can be seen in the branch below, which implements<br>

"CycleInfo" as a template over LLVM IR and MIR. It's a work in progress.<br>

<br>

<a href="https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template" rel="noreferrer" target="_blank">https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template</a><br>

<br>

For now, we can consider a simpler analysis "CfgInfo" that reports<br>

various properties such as outgoing edges per node.<br>

<br>

Example A: CfgInfo over LLVM IR<br>

<br>

    CfgInfo::compute(BasicBlock *entry) {<br>

      worklist.enqueue(entry);<br>

      while (!worklist.empty) {<br>

        auto B = worklist.dequeue();<br>

        for (auto S : successors(B)) {<br>

           worklist.enqueue_if_new(S);<br>

           collectStats(B, S);<br>

        }<br>

      }<br>

    }<br>

<br>

Example B: CfgInfo over Machine IR<br>

<br>

    CfgInfo::compute(MachineBasicBlock *entry) {<br>

      worklist.enqueue(entry);<br>

      while (!worklist.empty) {<br>

        auto B = worklist.dequeue();<br>

        for (auto S : B->successors()) { // <-- diff<br>

           worklist.enqueue_if_new(S);<br>

           collectStats(B, S);<br>

        }<br>

      }<br>

    }<br>

<br>

Example C: Template over both IRs<br>

<br>

    template <typename BlockType><br>

    CfgInfo::compute(BlockType *entry) {<br>

      worklist.enqueue(entry);<br>

      while (!worklist.empty) {<br>

        auto B = worklist.dequeue();<br>

        for (auto S : successors(B)) { // <-- needs an overload<br>

           worklist.enqueue_if_new(S);<br>

           collectStats(B, S);<br>

        }<br>

      }<br>

    }<br>

<br>

In this example, any instance of BlockType needs to provide a global<br>

function "successors()". The following is sufficient with Machine IR in<br>

this particular case, but in general, this would be provided by traits:<br>

<br>

namespace llvm {<br>

    auto successors(MachineBasicBlock *B) { return B->successors(); }<br>

}<br>

<br>

Example D: CfgInfo over a common base class<br>

<br>

    CfgInfo::compute(BasicBlockBase *entry) {<br>

      worklist.enqueue(entry);<br>

      while (!worklist.empty) {<br>

        auto B = worklist.dequeue();<br>

        for (auto S : B->successors()) { // <-- not virtual<br>

           worklist.enqueue_if_new(S);<br>

           collectStats(B, S);<br>

        }<br>

      }<br>

    }<br>

<br>

This last version is not a template. It's an implementation that gets<br>

reused for both LLVM IR and Machine IR by simply casting the respective<br>

basic block type to the base type.<br>

<br>

There are two main problems with the template approach:<br>

<br>

1. Just for that one function "successors()", the type of the block has<br>

   to be passed around as a template parameter. Starting from the entry<br>

   point of the analysis, this has to go all the way to every leaf<br>

   function that wants to call "successors()". This can especially be a<br>

   problem if the template has other dependent types such as iterator<br>

   adaptors.<br>

<br>

2. The actual analysis can be quite complicated, but because it's a<br>

   template, it has to be in a header file so that it can be<br>

   instantiated at the point where the analysis is actually computed.<br>

<br>

The CycleInfo patch demonstrates both these problems. There are multiple<br>

files as follows:<br>

<br>

GenericCycleInfo.h<br>

<br>

This file defines the templates GenericCycleInfo<BlockType> and<br>

GenericCycle<BlockType>. It also contains the entire implementation of<br>

the analysis.<br>

<br>

CycleInfo.h<br>

<br>

- This file declares a thin wrapper CycleInfo around<br>

  GenericCycleInfo<BasicBlock> and another wrapper Cycle around<br>

  GenericCycle<BasicBlock>. These are the LLVM IR instances of the<br>

  templates.<br>

- The wrappers avoid including GenericCycleInfo.h by only needing the<br>

  pointer type to the wrapped template instances.<br>

- In particular, notice how the child_iterator explicitly exposes the<br>

  details of the wrapped type, but then returns the wrapper with the `*'<br>

  operator.<br>

<br>

  using const_child_iterator_base =<br>

    typename std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;<br>

  struct const_child_iterator<br>

      : iterator_adaptor_base<const_child_iterator, const_child_iterator_base> {<br>

    using Base =<br>

        iterator_adaptor_base<const_child_iterator, const_child_iterator_base>;<br>

<br>

    Cycle operator *() const;<br>

    const_child_iterator(const_child_iterator_base I) : Base(I) {}<br>

  };<br>

<br>

CycleInfo.cpp<br>

<br>

This file is where the template is actually instantiated, along with the<br>

implementation of the wrapper classes.<br>

<br>

A similar pair of files will be needed for MachineCycleInfo later. These<br>

will be almost identical to the CycleInfo files, except for the name of<br>

the block type and the overload for "succesors()" and "predecessors()".<br>

<br>

All these files are just boilerplate designed to hide a template and<br>

separately expose two instances of that template. Every type introduced<br>

by the template needs to be carefully wrapped for use in other<br>

files. Instead, if there was an interface (and not just a concept) that<br>

exposed CFG edges, there would be just one CycleInfo.h file to declare<br>

the analysis classes and one CycleInfo.cpp file to implement them.<br>

<br>

So the argument goes somewhat like this:<br>

<br>

1. Static polymorphism through templates creates a development overhead<br>

   in the form of boilerplate, and also some runtime overhead of indirecting<br>

   through wrapper types.<br>

<br>

2. Dynamic polymorphism through an abstract base class or an abstract<br>

   context class removes the boilerplate, but it introduces the runtime<br>

   overhead of indirect calls. It also retains the runtime overhead of<br>

   indirecting through opaque wrappers.<br>

<br>

3. A non-abstract base class eliminates all the above overheads (for<br>

   this specific problem of CFG traversal), but introduces a memory<br>

   overhead because some information from the derived classes has to be<br>

   duplicated in the base class.<br>

<br>

The CycleInfo analysis exemplifies our motivation for moving away from<br>

templates and traits. Assuming virtual calls are considered strongly<br>

undesirable, that leaves us with the option of a non-abstract base<br>

class.</blockquote><div><br></div><div><br></div><div>Thanks, that was an excellent summary and the example was really helpful (at least to me)!</div><div>I'm fairly old school here I guess, as I'd have naively written an abstract base class with virtual methods (or function_ref callback into the implementation), but there are concerns with the indirection compared to a templated implementation.</div><div>But aren't we trading the cost of this indirection with even more cost if we have to duplicate information in a base class?</div><div><br></div><div>Best,</div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div></div></div>