[llvm-commits] Speeding up instruction selection

Tue Apr 22 19:43:20 PDT 2008

On Apr 21, 2008, at 8:51 AM, Roman Levenstein wrote:
> Sorry, forgot to attach the patch ;-)
>
> 2008/4/21, Roman Levenstein <romix.llvm at googlemail.com>:
>> Hi,
>>
>> 2008/4/19, Roman Levenstein <romix.llvm at googlemail.com>:
>>
>>> Hi Dan, Hi Evan,
>>>
>>> 2008/4/19, Dan Gohman <gohman at apple.com>:
>>>
>>>>
>>>> On Apr 16, 2008, at 9:19 AM, Roman Levenstein wrote:
>>>>>
>>>>
>>>>> BTW, Dan, have you had any time to look at the patch introducing
>>>>> queues instead of heaps in the target specific instruction  
>>>>> selectors?
>>>>
>>>>
>>>> Is this the ScheduleDAGRRList patch that changes  
>>>> std::priority_queue
>>>> to std::set? I know you checked in part of those changes already;
>>>> could you send me an updated patch with just the outstanding  
>>>> changes?
>>>
>>>
>>> No. I don't mean the ScheduleDAGRRList.pacth. For this one, I  
>>> proposed
>>> the remaining patch just with the outstanding patches yesterday (see
>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080414/061236.html) 
>>> .
>>> And it is also about using the std::set :-)
>>>
>>> The patch for instruction selectors was proposed by me here:
>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
>>> It is related to the Select() and SelectRoot implementations inside
>>> the target-specific code selectors generated by tablegen.
>>>
>>> Regarding my questions about std::set vs std;;multiset in the  
>>> mentioned patch:
>>> Having looked into it again, I have the impression that std::set  
>>> could
>>> be used and there is no need for std::multiset. Basically, the this
>>> code:
>>>  void AddToISelQueue(SDOperand N) DISABLE_INLINE {
>>>   int Id = N.Val->getNodeId();
>>>   if (Id != -1 && !isQueued(Id)) {
>>>   ...
>>> }
>>> ensures that only one node with a given NodeId can be inserted into
>>> the ISelQueue.
>>>
>>> One possible further improvement, I'm thinking about is the  
>>> following:
>>> Why do we use a sorted data structure (e.g. sorted vector, set) for
>>> the ISelQueue at all?
>>>
>>> 1) If we look into the SelectRoot logic, it basically takes the node
>>> with the SMALLEST NodeId from the pending selection queue.
>>> 2) More over, it seems to me (please, confirm if it is true!) that  
>>> the
>>> next node taken for selection in the loop inside the SelectRoot, at
>>> any point in time, has a NodeId that is GREATER than the NodeId of  
>>> the
>>> last selected node.
>>> 3) Even if the selection of the node puts new nodes into the  
>>> selection
>>> queue, they ALWAYS have NodeIds GREATER than that of this node.
>>> Therefore the property (2) is preserved.
>>>
>>> I checked (2) and (3) by adding some debug output to the SelectRoot.
>>> Only one test of all llvm/test test-cases does not fulfill this
>>> assumptions and even there it is not quite obvious if it is a
>>> contradiction to (2) and (3) or a bug.
>>>
>>> So, assuming that properties (2) and (3) do hold, the following  
>>> can be
>>> probably done:
>>> - We iterate over the NodeId numbers from 0 till the DAGSize.
>>> - We check if a Node with a given NodeId (by looking into the
>>> TopOrder???), if isQueued(NodeId) is true, i.e. this is in the queue
>>> - We then check if isSelected(NodeId) is true or false and call
>>> Select() for this node, if it is not selected yet.
>>> - If a new node needs to be inserted into the queue, only
>>> setQueued(NodeId) is called
>>>
>>> With these changes, no online sorting would be required at all,  
>>> which
>>> is a great improvement over the current approach.
>>>
>>> What do you think about it?  Is it a correct analysis?
>>
>>
>> I created a working implementation based on these I ideas. Now all
>> llvm/test test-cases pass it. Please find it in  the attached patch.
>> The patch is rather ugly and uses a lot of #ifdefs, but I think it is
>> OK for a proof of concept.
>>
>> Short explanation for the macros:
>> OLD - enables old-style selector queue, using heaps. Can be used to
>> have a step-by-step comparison with std::set based approach.
>> USE_QUEUE - enables std::set based approach
>> NO_QUEUE - enables the implementation that does not use any special
>> sorted queue at all, as outlined in the previous mail.
>> (Note: USE_QUEUE should not be used at the same time as NO_QUEUE)
>>
>> Both USE_QUEUE and NO_QUEUE modes speed-up the instruction selection
>> significantly (on my test compilation time goes down from 12 to 9
>> seconds).
>>
>> I'm very interested in your comments about this approach.

This looks like a nice approach. Re-using and updating TopOrder
instead of building up a std::set that ends up with the same contents
is neat. I was a little concerned about the priority-queue code being
removed since I don't know if there were future plans for it, however
it looks like doing anything with ordering more specific than
simple topological ordering would require a fair amount of work
anyway, so I'm not worrying about it.

Do you see any compile-time difference between using a std::set
queue and the NO_QUEUE approach, out of curiosity?

In addition to regular testing, since this patch doesn't modify any
heuristics, it should be possible to compare assembly output
between an unmodified compiler and one with this patch applied.
Can you verify that this patch doesn't change any output on some
interesting testcases?

Thanks,

Dan