[llvm-commits] Speeding up instruction selection

Mon Apr 21 08:48:54 PDT 2008

Hi,

2008/4/19, Roman Levenstein <romix.llvm at googlemail.com>:
> Hi Dan, Hi Evan,
>
>  2008/4/19, Dan Gohman <gohman at apple.com>:
>
> >
>  >  On Apr 16, 2008, at 9:19 AM, Roman Levenstein wrote:
>  >  >
>  >
>  > > BTW, Dan, have you had any time to look at the patch introducing
>  >  > queues instead of heaps in the target specific instruction selectors?
>  >
>  >
>  > Is this the ScheduleDAGRRList patch that changes std::priority_queue
>  >  to std::set? I know you checked in part of those changes already;
>  >  could you send me an updated patch with just the outstanding changes?
>
>
> No. I don't mean the ScheduleDAGRRList.pacth. For this one, I proposed
>  the remaining patch just with the outstanding patches yesterday (see
>  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080414/061236.html).
>  And it is also about using the std::set :-)
>
>  The patch for instruction selectors was proposed by me here:
>  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
>  It is related to the Select() and SelectRoot implementations inside
>  the target-specific code selectors generated by tablegen.
>
>  Regarding my questions about std::set vs std;;multiset in the mentioned patch:
>  Having looked into it again, I have the impression that std::set could
>  be used and there is no need for std::multiset. Basically, the this
>  code:
>   void AddToISelQueue(SDOperand N) DISABLE_INLINE {
>    int Id = N.Val->getNodeId();
>    if (Id != -1 && !isQueued(Id)) {
>    ...
>  }
>  ensures that only one node with a given NodeId can be inserted into
>  the ISelQueue.
>
>  One possible further improvement, I'm thinking about is the following:
>  Why do we use a sorted data structure (e.g. sorted vector, set) for
>  the ISelQueue at all?
>
>  1) If we look into the SelectRoot logic, it basically takes the node
>  with the SMALLEST NodeId from the pending selection queue.
>  2) More over, it seems to me (please, confirm if it is true!) that the
>  next node taken for selection in the loop inside the SelectRoot, at
>  any point in time, has a NodeId that is GREATER than the NodeId of the
>  last selected node.
>  3) Even if the selection of the node puts new nodes into the selection
>  queue, they ALWAYS have NodeIds GREATER than that of this node.
>  Therefore the property (2) is preserved.
>
>  I checked (2) and (3) by adding some debug output to the SelectRoot.
>  Only one test of all llvm/test test-cases does not fulfill this
>  assumptions and even there it is not quite obvious if it is a
>  contradiction to (2) and (3) or a bug.
>
>  So, assuming that properties (2) and (3) do hold, the following can be
>  probably done:
>  - We iterate over the NodeId numbers from 0 till the DAGSize.
>  - We check if a Node with a given NodeId (by looking into the
>  TopOrder???), if isQueued(NodeId) is true, i.e. this is in the queue
>  - We then check if isSelected(NodeId) is true or false and call
>  Select() for this node, if it is not selected yet.
>  - If a new node needs to be inserted into the queue, only
>  setQueued(NodeId) is called
>
>  With these changes, no online sorting would be required at all, which
>  is a great improvement over the current approach.
>
>  What do you think about it?  Is it a correct analysis?

I created a working implementation based on these I ideas. Now all
llvm/test test-cases pass it. Please find it in  the attached patch.
The patch is rather ugly and uses a lot of #ifdefs, but I think it is
OK for a proof of concept.

Short explanation for the macros:
OLD - enables old-style selector queue, using heaps. Can be used to
have a step-by-step comparison with std::set based approach.
USE_QUEUE - enables std::set based approach
NO_QUEUE - enables the implementation that does not use any special
sorted queue at all, as outlined in the previous mail.
(Note: USE_QUEUE should not be used at the same time as NO_QUEUE)

Both USE_QUEUE and NO_QUEUE modes speed-up the instruction selection
significantly (on my test compilation time goes down from 12 to 9
seconds).

I'm very interested in your comments about this approach.

-Roman