[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Wed Mar 31 09:01:37 PDT 2021

David Blaikie writes:

>> I am currently involved in the activity towards decoupling AMDGPU's
>> dependency on the abstractions being discussed here, and the one
>> thing that immediately jumps out is the ability to traverse the
>> predecessors and successors of a basic block. From all the emails so
>> far and the resulting proposal, it seems that both David and Nicolai
>> think that having a common non-abstract base class for basic blocks
>> is a good idea.
>
> I don't /think/ that's what I was advocating for (but it's been a
> while, and I might be misremembering - if you've got pointers to
> particular parts of the conversation to refresh my memory/frame this
> discussion, that'd be handy) - I'd worry about the cost of trying to
> introduce a common base class between these different APIs. I think
> what I was advocating for was for them to have syntactically matching
> APIs so that templated code could abstract over them without the need
> for traits/adapters.

You're right. Turns out that you didn't specifically advocate what I
said above. The following is what was actually said, but I misunderstood
something else from the thread:

https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html

  "I meant for a different direction - if we got buy-in for making LLVM IR and
   MIR implement the same C++ concept (& wrote down what that concept was -
   which APIs, how they look textually) then I think it would be relatively
   easy to get the small/mechanical renaming patches reviewed. Incremental
   from day 1, not big new abstraction, then incremental."

>> So if we create a common non-abstract base class for basic blocks,
>> then MLIR and LLVM IR will incur the overhead of two vectors, one
>> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%
>> increase in the size LLVM IR in some typical programs:
>>
>> https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
>
> That's overhead in the bitcode file size, yes?

It's the memory overhead of storing those pointers in a vector. I don't
think it needs to affect the bitcode, since it's just redundant
information that can be populated after reading the whole function.

>> Is this overhead the main issue in deciding whether we should
>> introduce the common base class for traversing a CFG?
>
> I expect that would be a pretty significant concern, but also memory
> usage and any time overhead keeping these new data structures up to
> date, or slowing down compilation due to memory paging in and out,
> etc.

Right. If the discussion progressed to those details, then we will be in
a good place :)

Everyone seems to agree that we need an abstraction that makes it easier
to write analyses, and Nicolai's document captures all the options. But
in particular the document makes a case against templates, and instead
advocates a way to write analyses without referring to the actual types
in the underlying IR. This is achieved with a combination of opaque
handles and dynamic polymorphism. The document also strongly recommends
minimizing the dynamic polymorphism by designing "common non-abstract
base classes for SSA basic block and instructions concepts".

My focus right now is on the base class for basic blocks. It represents
a frequently used property of the CFG, and the problem is concise, and
already familiar to anyone writing an analysis. Our current experience
with writing new analyses in AMDGPU provides a motivation for moving
away from templates. But this is only a specific part of the bigger
problem. In particular, it only represents the "CFG" part of the
problem, but not the "SSA IR" part.

The actual use-case can be seen in the branch below, which implements
"CycleInfo" as a template over LLVM IR and MIR. It's a work in progress.

https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template

For now, we can consider a simpler analysis "CfgInfo" that reports
various properties such as outgoing edges per node.

Example A: CfgInfo over LLVM IR

    CfgInfo::compute(BasicBlock *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : successors(B)) {
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

Example B: CfgInfo over Machine IR

    CfgInfo::compute(MachineBasicBlock *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : B->successors()) { // <-- diff
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

Example C: Template over both IRs

    template <typename BlockType>
    CfgInfo::compute(BlockType *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : successors(B)) { // <-- needs an overload
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

In this example, any instance of BlockType needs to provide a global
function "successors()". The following is sufficient with Machine IR in
this particular case, but in general, this would be provided by traits:

namespace llvm {
    auto successors(MachineBasicBlock *B) { return B->successors(); }
}

Example D: CfgInfo over a common base class

    CfgInfo::compute(BasicBlockBase *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : B->successors()) { // <-- not virtual
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

This last version is not a template. It's an implementation that gets
reused for both LLVM IR and Machine IR by simply casting the respective
basic block type to the base type.

There are two main problems with the template approach:

1. Just for that one function "successors()", the type of the block has
   to be passed around as a template parameter. Starting from the entry
   point of the analysis, this has to go all the way to every leaf
   function that wants to call "successors()". This can especially be a
   problem if the template has other dependent types such as iterator
   adaptors.

2. The actual analysis can be quite complicated, but because it's a
   template, it has to be in a header file so that it can be
   instantiated at the point where the analysis is actually computed.

The CycleInfo patch demonstrates both these problems. There are multiple
files as follows:

GenericCycleInfo.h

This file defines the templates GenericCycleInfo<BlockType> and
GenericCycle<BlockType>. It also contains the entire implementation of
the analysis.

CycleInfo.h

- This file declares a thin wrapper CycleInfo around
  GenericCycleInfo<BasicBlock> and another wrapper Cycle around
  GenericCycle<BasicBlock>. These are the LLVM IR instances of the
  templates.
- The wrappers avoid including GenericCycleInfo.h by only needing the
  pointer type to the wrapped template instances.
- In particular, notice how the child_iterator explicitly exposes the
  details of the wrapped type, but then returns the wrapper with the `*'
  operator.

  using const_child_iterator_base =
    typename std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;
  struct const_child_iterator
      : iterator_adaptor_base<const_child_iterator, const_child_iterator_base> {
    using Base =
        iterator_adaptor_base<const_child_iterator, const_child_iterator_base>;

    Cycle operator *() const;
    const_child_iterator(const_child_iterator_base I) : Base(I) {}
  };

CycleInfo.cpp

This file is where the template is actually instantiated, along with the
implementation of the wrapper classes.

A similar pair of files will be needed for MachineCycleInfo later. These
will be almost identical to the CycleInfo files, except for the name of
the block type and the overload for "succesors()" and "predecessors()".

All these files are just boilerplate designed to hide a template and
separately expose two instances of that template. Every type introduced
by the template needs to be carefully wrapped for use in other
files. Instead, if there was an interface (and not just a concept) that
exposed CFG edges, there would be just one CycleInfo.h file to declare
the analysis classes and one CycleInfo.cpp file to implement them.

So the argument goes somewhat like this:

1. Static polymorphism through templates creates a development overhead
   in the form of boilerplate, and also some runtime overhead of indirecting
   through wrapper types.

2. Dynamic polymorphism through an abstract base class or an abstract
   context class removes the boilerplate, but it introduces the runtime
   overhead of indirect calls. It also retains the runtime overhead of
   indirecting through opaque wrappers.

3. A non-abstract base class eliminates all the above overheads (for
   this specific problem of CFG traversal), but introduces a memory
   overhead because some information from the derived classes has to be
   duplicated in the base class.

The CycleInfo analysis exemplifies our motivation for moving away from
templates and traits. Assuming virtual calls are considered strongly
undesirable, that leaves us with the option of a non-abstract base
class.

Sameer.