[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
Sameer Sahasrabuddhe via llvm-dev
llvm-dev at lists.llvm.org
Wed Jun 9 01:02:08 PDT 2021
Sameer Sahasrabuddhe writes:
> That is related to what Mehdi asked in a sibling email:
>> I'm fairly old school here I guess, as I'd have naively written an
>> abstract base class with virtual methods (or function_ref callback
>> into the implementation), but there are concerns with the indirection
>> compared to a templated implementation.
>> But aren't we trading the cost of this indirection with even more cost
>> if we have to duplicate information in a base class?
> Yes, we are avoiding indirect calls (not to be confused with just
> indirection through a data pointer). It's useful to ask whether it is
> okay to have indirect calls (through virtual functions) when computing
> an analysis instead of doing something that affects the IR
> itself. Is it an option to let each analysis worry about its own
> performance, perhaps by caching internal results of the virtual calls?
> Presumably, using the results of an analysis or updating the analysis
> will only have an incremental cost from indirect calls?
I did a little experiment with the bitcode sample files provided by
Jakub Kuderski , and mentioned in Nicolai's original proposal :
The experiment measures the impact of virtual function calls on the
Static Polymporphism: Version that uses templates to implement the same
analysis on LLVM IR and Machine IR.
Dynamic Polymorphism: Version that uses the SsaContext with virtual functions.
Total times for cycleinfo on LLVM IR (tab-separated) are as follows:
Name Static Dynamic %-change
asm2wasm.bc 0.0725 0.0758 4.5517%
clang-9.bc 1.1039 1.1608 5.1545%
lld.bc 0.5209 0.5442 4.4730%
llvm-as.bc 0.1165 0.122 4.7210%
llvm-dis.bc 0.0529 0.0554 4.7259%
opt.bc 0.4562 0.4789 4.9759%
rippled.bc 0.4007 0.4188 4.5171%
sqlite3.bc 0.0072 0.0077 6.9444%
wasm-as.bc 0.0683 0.0713 4.3924%
wasm-ctor-eval.bc 0.0687 0.0723 5.2402%
wasm-dis.bc 0.0681 0.071 4.2584%
wasm-emscripten-finalize.bc 0.068 0.0718 5.5882%
wasm-metadce.bc 0.0686 0.0724 5.5394%
wasm-opt.bc 0.072 0.0759 5.4167%
wasm-reduce.bc 0.0686 0.0725 5.6851%
wasm-shell.bc 0.0694 0.0728 4.8991%
wasm2js.bc 0.0714 0.0751 5.1821%
On average, the virtual calls are slower by 5%.
>> An observation about the performance implications of common base classes
>> for basic blocks. I'm taking it as a given that a common base class would
>> have predecessor and successor vectors. In addition to making predecessor
>> and successor iteration simpler, this could actually end up _saving_ memory
>> for LLVM IR in many cases if we make the following additional changes:
>> 1. Change the definition of phi nodes so that they only contain the
>> predecessor _values_ in their argument lists (removing all the arguments
>> referring to the predecessor basic blocks).
>> 2. Change the definition of terminator instructions so that they no longer
>> contain the successor basic block(s) in their argument lists and instead
>> implicitly refer to the basic blocks successor list. For example, a
>> conditional branch would be defined such that its only argument is the i1
>> condition. If the condition is true, the program branches to
>> brInst->getParent()->getSuccessors(); if the condition is false, it
>> branches to brInst->getParent()->getSuccessors().
>> This is an option that I hadn't taken into account when I calculated those
>> 3-4% of memory overhead, but it should end up saving memory whenever a
>> basic block has >= 2 phi nodes (the vectors themselves have an overhead, so
>> it depends). My experiment would have to be re-done to get a better idea of
>> the final balance.
>> The changes mentioned above are admittedly very invasive and complicated
>> changes. It would be good to get an idea of what sort of acceptance
>> criterion the LLVM community has here.
I made an initial attempt at scratching the surface of this. As an
experiment, I changed IRBuilder::CreateBr() to assert that a direct
branch can only be created at the end of a known basic block:
The entire patch is a set of fairly independent changes to various uses
of the IRBuilder. The most interesting changes are in SimplifyCFG.cpp,
but even there, some changes are independent. My overall impression is
that it is feasible to change how CFG edges are stored in LLVM IR.
The main problem with the BasicBlock is:
- Successors are stored in the terminator instruction.
- Predecessors are stored indirectly because the BasicBlock is also a
Value, and the terminator in each predecessor is a Use.
- Predecessors are also stored in each PHI node.
All these copies will need to be eliminated as follows:
1. Require a parent basic block when accessing or modifying the CFG
edges on a terminator or a PHI node.
2. Any constructor that takes a target block will require one of the
a. An InsertAtEnd argument (parent block).
b. Or, in the case of PHI, an InsertBefore argument. Inserting a
terminator before another terminator will not be supported.
3. Disallow the removeFromParent() method for terminators. It may still
be supported for a PHI, where the input Values can still be accessed
without any knowledge of the incoming edges.
4. Potentially add a moveToNewParent() method.
5. Corresponding changes to IRBuilder. For example, CreateBr() will
assert that the InsertPt is at the end of a valid basic block which
does not already have a terminator.
The BasicBlock will still be Use'd in a BlockAddress constant, but
that's a separate kind of use anyway.
This should not change bitcode in any way; it's only intended as a
change to in-memory IR. There may be some impact on the bitcode reader.
There's a good chance that I missed something important. But if all goes
well, most of this is just a long list of shallow changes, with some
changes being more involved than others.
More information about the llvm-dev