[llvm-commits] Speeding up instruction selection

Wed Mar 5 08:48:51 PST 2008

Hi Evan,

2008/3/4, Evan Cheng <evan.cheng at apple.com>:
>  >> There's make_heap/push_heap/etc. in <algorithm> that let a
>  >> plain std::vector (or a SmallVector I guess) be used as a heap.
>  >
>  > Yes, this is possible but produces much more overhead than std::set on
>  > my tests. BTW, this approach is used in DAGISel.inc files generated by
>  > tablegen. I tried to changed it to std::set as well and ,again, it
>  > works much (25%-30%) faster  on BBs with few hundreds or thousends
>  > instructions.
>
> If you give me a patch, I'll test it on my end. Thanks.

Here is a patch for the DAGISel.inc. It is generated as a diff against
the X86GenDAGISel.inc generated by tablegen. It is a bit ugly, but
gives you the idea and enables testing.

As a test, I used the big4.bc, which is one huge MBB. You can find it here:
http://llvm.org/bugs/attachment.cgi?id=1275&action=edit

I would be very interested if you could review it, test and provide
some feedback.

One thing I do not quite understand about the instruction selector is:
1) Can there be more than one SDNode with the same NodeId in the ISelQueue?
    I have the impression that it is possible, but I'm not sure.
2)  Can _the same_ SDNode ocure more than once in the ISelQueue?

These two questions are relevant, if std::set is to be used. Sets use
the NodeId as a key of a given SDNode and std::set ensures the
uniqueness of the the elements in the ISelQueue. If (1) is true, then
probably std::multiset should be used instead of std::set. I tried
with both set implementations and performance was roughly the same
between them.

I have also one more question regarding the ISelQueue:

What exactly does it represent and how is it built? My understanding
is that we start with the root element and then all of its
dependencies are pushed into the queue as instruction selection
proceeds. Then their dependencies and so on. But is it somehow
related/similar to scheduler's dependencies? Would it be possible to
do some sort of the topological sorting on the DAG first and then do
the selection? For the above mentioned big4.bc use-case, the ISelQueue
sometimes has up-to 2000 SDNodes in the queue, which makes make_heap()
very inefficient. Is it normal that the queue becomes so long? Could
it be that some dependencies are just selected already and could be
safely removed?

I cannot really explain and realize it at the moment yet, but it seems
to me that a more efficient data structure than a priority queue could
be used during instruction selection.

-Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: X86GenDAGISel.inc.patch
Type: text/x-diff
Size: 5615 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20080305/07ddc900/attachment.patch>