[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Tue Mar 30 09:20:34 PDT 2021

---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote ----

 > Owen - perhaps you've got some relevant perspective here given the GPU
 > context/background, and general middle end optimization design
 > questions. Could you take a look?
 > (& perhaps Johannes?)
 > 
 > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at gmail.com> wrote:
 > >
 > > Thanks for sending this out!
 > >
 > > +Mehdi - if you have a chance to look, I'd appreciate your thoughts on this, partly as a general LLVM contributor, but also with an eye to abstractions over multiple IRs given your work on MLIR
 > > +Lang - you mentioned maybe some colleagues of yours might be able to take a look at the design here?
 > >
 > > I'd like to try writing up maybe alternatives to a couple of the patches in the series to see how they compare, but haven't done that as yet.
 > >
 > > My brief take is that I think the opaque handles make a fair bit of sense/seem high-value/low-cost (if it turned out the handle for one of these IR types needed to become a bit heavier, I don't think it'd be super difficult to revisit this code - make the generic handle large enough to accommodate whatever the largest concrete handle happens to be, etc). Though I'm still a bit less certain about the full runtime polymorphic abstractions.

For context, this is about the options laid out in the following document from Nicolai:

https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing

The above document was first introduced in the following email, which is the starting point of the current thread:

https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html

The main issue is the need for an abstraction that greatly improves the experience of writing analyses that work on LLVM IR and MIR (and potentially also on MLIR). The way I understand it, Nicolai proposed an abstraction that involves dynamic polymorphism to abstract out the details of the underlying IR. David was more in favour of first unifying the IRs to the point that the dynamic polymorphism is limited to only occasional corner cases instead.

I am currently involved in the activity towards decoupling AMDGPU's dependency on the abstractions being discussed here, and the one thing that immediately jumps out is the ability to traverse the predecessors and successors of a basic block. From all the emails so far and the resulting proposal, it seems that both David and Nicolai think that having a common non-abstract base class for basic blocks is a good idea. Such a base class will provide a vector of predecessors and successors that can be used to traverse the CFG independent of the actual IR.

The current situation is that the basic blocks in LLVM IR and MLIR do not explicitly track their preds and succs. Preds are determined from the uses of the block, while succs are determined from the terminator instruction. MIR has explicit pred and succ vectors, but the succ pointers are duplicated by their presence as operands to the terminator instructions. The MachineBasicBlock is not a value, so there is no notion of traversing its uses for predecessors.

So if we create a common non-abstract base class for basic blocks, then MLIR and LLVM IR will incur the overhead of two vectors, one each for the preds and succs. Nicolai had reported a 3.1% to 4.1% increase in the size LLVM IR in some typical programs:

https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing

Is this overhead the main issue in deciding whether we should introduce the common base class for traversing a CFG? Do note that MIR already duplicates successors. One could argue that duplicating the successor pointers in LLVM IR and MLIR merely brings them on the same level as MIR. If that argument sticks, then the "real" overhead is only half of what is reported to account for the duplicate predecessor pointers.

The duplication of predecessor pointers stems from the fact that basic blocks in LLVM IR and MLIR are also values that are (primarily) used as operands to the terminator and PHI instructions. We could redefine these instructions to use non-value basic blocks as operands. CFG edges are not the typical use-def relation that other values represent, and it's reasonable to say that PHINodes and terminators are special in this one way.  I have not managed to see how far that rabbit hole goes, but I am not really convinced that this is attractive in any way. Is this even an option?

Sameer.