[LLVMdev] selection dag speedups / llc speedups

Mon May 17 13:53:31 PDT 2010

On May 14, 2010, at 11:24 AM, Jan Voung wrote:

> Hello folks,
> 
> I'm sure this has been asked many times, but is there current work on decreasing the time taken by the DAG-based instruction selector, or the other phases of llc? I am just beginning to dive into LLVM, and I am interested in compile-time reductions that do not reduce code quality dramatically. For example, simply switching on "-fast-isel" (roughly 17% drop in code quality for some of my benchmarks) or "-regalloc=local/fast" (roughly 2x slower for some of my benchmarks) is, unfortunately, not an acceptable option.
> 
> I noticed that a semi-recent commit added the "STATISTIC(NumDAGIselRetries,"Number of times dag isel has to try another path");" as well as a sorting change in DAGISelEmitter.cpp. This suggests to me that there is work being done. Are there other similar efforts, but perhaps aiming for bigger gains, that I should be aware of? From the other end, are there efforts to make the fast-path produce better quality code?\

The specific work you mention was aimed at reducing isel's own code
size. Previously, isel used a huge mountain of generated code which took
a long time to compile, and consumed a large amount of memory.  The
new isel fixes these problems, but underneath it's using the same
conceptual algorithm, and I believe is about the same speed.

SelectionDAG performance is important. It hasn't seen a lot of
attention recently, as the "-O0" use case has been where most of
the compile-time attention has gone. But speeding up SelectionDAG
isel would also be useful for many different purposes.

> 
> Some small things I've noticed:
> 
>  - LegalizeVectors might not make a single change over an entire input bc file, yet it can take about 6% of the ISel time. Identifying that this can be skipped ahead of time may help?

Sounds reasonable. Perhaps there could be a SelectionDAG-wide flag which
indicates whether any vectors are present.

Another thing that could be done is avoiding running DAGCombine when
it isn't needed. I believe there are a few cases where the code just
reruns DAGCombine even if the preceding Legalize pass doesn't make
any changes.

Longer term, I've also heard some ideas tossed around of merging
LegalizeDAG.cpp and now also LegalizeVectorsOps.cpp with the new
LegalizeTypes infrastructure so that everything would run as a single
pass, which would greatly reduce the amount of "Walk and hash all the
nodes" work that these passes currently do.

Another thing to look at is the underlying FoldingSet data structure.
Right now, node lookup requires a lot of recomputation of hash data.

>  - We can take the Changed flag returned by LegalizeVectors, and pass it on to Legalize so that it can decide not to AssignTopologicalOrder again, since it was already done in the LegalizeVectors pass. However, this is a negligible reduction: 0.2 seconds shaved off a 30+ second compile =).

Ok.

>  - BuildSchedGraph() can take around 30% of the X% of time taken by Instruction Scheduling. How different is the scheduler graph from the selection graph?

The scheduling graph is constructed using the selection graph information,
but it does different things. Constants and other nodes which don't correspond
to instructions are excluded, nodes which must be scheduled together are
merged, extra edges can be added, the schedule graph doesn't do automatic
CSE on edge changes, etc.  The schedule graph also has data to track
node height and depth, and other information used by the scheduler.

Dan