[LLVMdev] Implementing devirtualization
nicholas at mxc.ca
Thu Dec 8 22:03:43 PST 2011
Vitor Luis Menezes wrote:
> Hello all,
> Our compilers class has been using LLVM, and a partner and I decided to
> implement devirtualization of virtual C++ calls in LLVM as a class
> project. We quickly realized that existing debug metadata generated by
> Clang didn't give us enough info to (precisely) implement this, and as
> such have already begun modifying Clang to insert such metadata.
> However, for devirtualization we also need to reconstruct the class
> hierarchy, which we also seek to do through metadata. There appears to
> be sufficient metadata to do this, but we can't seem to figure out how
> to actually access this metadata and successfully reconstruct the class
> hierarchy. Can anyone help with this?
> We're also open to alternative approaches, but we'd like to stay in LLVM
> IR as much as possible.
Implement field-sensitive interprocedural sparse conditional constant
propagation and it will devirtualize.
The reason I don't like having an explicit devirtualization pass is that
it requires a large amount of analysis (interprocedural, no less!) and
is strictly single-purpose. The way that LLVM does devirtualization now
is through a series of tons of tiny steps:
* the very last stage in devirt is simply converting the load from a
constant to the loaded constant:
- but sometimes the frontend wouldn't normally emit the vtable at
all (the ABI wouldn't require it to), so we wouldn't see the constant to
propagate. We added a new linkage type to LLVM, "available_externally"
designed to let us capture this extra data that's useful for the
optimizer but without generating any additional symbols in the object file.
* GVN folds loads against stores, using alias analysis.
* alias analysis is helped by two interprocedural function attributes:
- noalias return values. This indicates that the pointer returned
does not alias any other pointer existing in the program. We will find
functions that return these and propagate noalias up the call graph.
- nocapture arguments. If the pointer doesn't escape the callee,
then the function gets a 'nocapture' annotation on it. This is also
propagated through the call graph.
- standard library functions have their noalias and nocapture parts
- together, this means that code like:
FILE *f = fopen(...);
fwrite(data, size, 1, f);
has a pointer 'f' which can never possibly alias anything else in
the program, and that's trivially provable. We also assume that "char *p
= (char*)rand();" can never guess the address of 'f'.
* the pass manager runs bottom-up over the callgraph, running the
CallGraphSCCPasses over each strongly connected component, and all the
function passes over each function. If we inline, we expose more code to
GVN which may make it possible to fold a load against its store.
- once we do so, we've changed the apparent call graph, while
iterating over the call graph. We use this to refine the SCCs in our and
re-run the optimization passes again, causing more refinement
iteratively, allowing us to devirtualize more.
- we can only inline functions defined inside a class body because
of C++'s ODR rule. The actual linkage type is "linkonce" which would
indicate that they are weak and can be discarded if never called, so we
added "linkonce_odr" which indicates that the symbol is weak but that
any replacement must be equivalent.
What's great about using small steps like this is that each of these
pieces is a useful optimization in its own right, even if we don't end
up devirtualizing. A devirtualization pass doesn't "smell right" to me
because it will have to do lots of analysis regardless of whether it
uses it to optimize or throws it away at the end. In LLVM, the
optimization passes try hard to be inexpensive when they don't transform
the code at all. It's also language specific, while LLVM's existing
approach works even with other languages or hand-rolled type hierarchies
written with casts and no language feature support.
The piece that's missing is devirtualization when we don't end up
inlining. I don't know of a way to break that down into smaller pieces;
llvm already has SCCP.cpp which does sparse conditional constant
propagation and that includes an interprocedural variant, IPSCCP, but
it's not field-sensitive. Propagating the stores from the vptr field to
the relevant loads in the functions that aren't inlined would give us
the last major missing piece.
Noalias returns, nocapture, SCC refinement, linkonce_odr and
available_externally were added with the goal of making devirtualization
in LLVM happen, but as orthogonal independent optimizations. I think
LLVM should continue with this design. If you want to implement a single
substantial optimization pass, I would suggest FSIPSCCP as the largest
thing you should write.
More information about the llvm-dev