[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Wed Jun 9 01:02:08 PDT 2021

Sameer Sahasrabuddhe writes:

> That is related to what Mehdi asked in a sibling email:
>
>> I'm fairly old school here I guess, as I'd have naively written an
>> abstract base class with virtual methods (or function_ref callback
>> into the implementation), but there are concerns with the indirection
>> compared to a templated implementation.
>> But aren't we trading the cost of this indirection with even more cost
>> if we have to duplicate information in a base class?
>
> Yes, we are avoiding indirect calls (not to be confused with just
> indirection through a data pointer). It's useful to ask whether it is
> okay to have indirect calls (through virtual functions) when computing
> an analysis instead of doing something that affects the IR
> itself. Is it an option to let each analysis worry about its own
> performance, perhaps by caching internal results of the virtual calls?
> Presumably, using the results of an analysis or updating the analysis
> will only have an incremental cost from indirect calls?

I did a little experiment with the bitcode sample files provided by
Jakub Kuderski [1], and mentioned in Nicolai's original proposal [2]:

[1] https://drive.google.com/drive/folders/1VJpym19cW-8BVgdtl2MsD3zB4CoEQ93O
[2] https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html

The experiment measures the impact of virtual function calls on the
cycleinfo analysis:

Static Polymporphism: Version that uses templates to implement the same
analysis on LLVM IR and Machine IR.
https://github.com/ssahasra/llvm-project/tree/cycleinfo-static-template

Dynamic Polymorphism: Version that uses the SsaContext with virtual functions.
https://github.com/ssahasra/llvm-project/tree/cycleinfo-dynamic-polymorphism

Total times for cycleinfo on LLVM IR (tab-separated) are as follows:

Name	Static	Dynamic	%-change
asm2wasm.bc	0.0725	0.0758	4.5517%
clang-9.bc	1.1039	1.1608	5.1545%
lld.bc	0.5209	0.5442	4.4730%
llvm-as.bc	0.1165	0.122	4.7210%
llvm-dis.bc	0.0529	0.0554	4.7259%
opt.bc	0.4562	0.4789	4.9759%
rippled.bc	0.4007	0.4188	4.5171%
sqlite3.bc	0.0072	0.0077	6.9444%
wasm-as.bc	0.0683	0.0713	4.3924%
wasm-ctor-eval.bc	0.0687	0.0723	5.2402%
wasm-dis.bc	0.0681	0.071	4.2584%
wasm-emscripten-finalize.bc	0.068	0.0718	5.5882%
wasm-metadce.bc	0.0686	0.0724	5.5394%
wasm-opt.bc	0.072	0.0759	5.4167%
wasm-reduce.bc	0.0686	0.0725	5.6851%
wasm-shell.bc	0.0694	0.0728	4.8991%
wasm2js.bc	0.0714	0.0751	5.1821%

On average, the virtual calls are slower by 5%.

>> An observation about the performance implications of common base classes
>> for basic blocks. I'm taking it as a given that a common base class would
>> have predecessor and successor vectors. In addition to making predecessor
>> and successor iteration simpler, this could actually end up _saving_ memory
>> for LLVM IR in many cases if we make the following additional changes:
>>
>> 1. Change the definition of phi nodes so that they only contain the
>> predecessor _values_ in their argument lists (removing all the arguments
>> referring to the predecessor basic blocks).
>> 2. Change the definition of terminator instructions so that they no longer
>> contain the successor basic block(s) in their argument lists and instead
>> implicitly refer to the basic blocks successor list. For example, a
>> conditional branch would be defined such that its only argument is the i1
>> condition. If the condition is true, the program branches to
>> brInst->getParent()->getSuccessors()[0]; if the condition is false, it
>> branches to brInst->getParent()->getSuccessors()[1].
>>
>> This is an option that I hadn't taken into account when I calculated those
>> 3-4% of memory overhead, but it should end up saving memory whenever a
>> basic block has >= 2 phi nodes (the vectors themselves have an overhead, so
>> it depends). My experiment would have to be re-done to get a better idea of
>> the final balance.
>>
>> The changes mentioned above are admittedly very invasive and complicated
>> changes. It would be good to get an idea of what sort of acceptance
>> criterion the LLVM community has here.

I made an initial attempt at scratching the surface of this. As an
experiment, I changed IRBuilder::CreateBr() to assert that a direct
branch can only be created at the end of a known basic block:

https://github.com/ssahasra/llvm-project/commit/2ea1deca2df736c91c1d00aa26528d7f14336bad

The entire patch is a set of fairly independent changes to various uses
of the IRBuilder. The most interesting changes are in SimplifyCFG.cpp,
but even there, some changes are independent. My overall impression is
that it is feasible to change how CFG edges are stored in LLVM IR.

The main problem with the BasicBlock is:
- Successors are stored in the terminator instruction.
- Predecessors are stored indirectly because the BasicBlock is also a
  Value, and the terminator in each predecessor is a Use.
- Predecessors are also stored in each PHI node.

All these copies will need to be eliminated as follows:

1. Require a parent basic block when accessing or modifying the CFG
   edges on a terminator or a PHI node.
2. Any constructor that takes a target block will require one of the
   following:
   a. An InsertAtEnd argument (parent block).
   b. Or, in the case of PHI, an InsertBefore argument. Inserting a
      terminator before another terminator will not be supported.
3. Disallow the removeFromParent() method for terminators. It may still
   be supported for a PHI, where the input Values can still be accessed
   without any knowledge of the incoming edges.
4. Potentially add a moveToNewParent() method.
5. Corresponding changes to IRBuilder. For example, CreateBr() will
   assert that the InsertPt is at the end of a valid basic block which
   does not already have a terminator.

The BasicBlock will still be Use'd in a BlockAddress constant, but
that's a separate kind of use anyway.

This should not change bitcode in any way; it's only intended as a
change to in-memory IR. There may be some impact on the bitcode reader.

There's a good chance that I missed something important. But if all goes
well, most of this is just a long list of shallow changes, with some
changes being more involved than others.

Sameer.