[llvm-commits] Speeding up instruction selection

Evan Cheng evan.cheng at apple.com
Wed Mar 19 09:44:08 PDT 2008


On Mar 19, 2008, at 4:12 AM, Roman Levenstein wrote:

> Hi Evan,
>
> 2008/3/12, Evan Cheng <evan.cheng at apple.com>:
>> Thanks. Some comments:
>
> Thanks for your comments. All tabs violations are fixed. All "if("
> without spaces are fixed.
>
>> +  typedef uses_iterator use_iterator;
>> This seems silly, why not just change all references to use_iterator
>> to uses_iterator?
>
> Done.
>
>> +    //NodesUsed = NULL;
>>
>>      // no cycles in the graph.
>>      for (SDNode::op_iterator I = N->op_begin(), E = N->op_end(); I !
>> = E; ++I) {
>>        SDNode *Operand = I->Val;
>> -      Operand->removeUser(N);
>> +      Operand->removeUser(std::distance(N->op_begin(), I), N);
>>
>> Can you iterate from end to begin to avoid calling std::distance()  
>> for
>> every user?
>
> Well, I'm afraid it is the only reliable way to compute the index,
> because it does not depend on the knowledge about how operands of
> SDNode are stored.  The old code was assuming the SDOperand[], which
> is not exactly the case now and will probably change in the future.
> BTW, I thought that std::distance() is actually very cheap and is
> probably expanded into a subtraction. Am I wrong?

Ok, it's not a performance issue in this case since it's an array.
>
>
>> -  return cast<ConstantSDNode>(OperandList[Num])->getValue();
>> +  return (cast<ConstantSDNode>(OperandList[Num]))->getValue();
>>
>> Why the extra parentheses?
>
> Fixed.
>
>> -  if (N->hasOneUse() && (*N->use_begin())->getOpcode() ==
>> ISD::FP_ROUND)
>> +  if (N->hasOneUse() && ((SDNode*)(*N->use_begin()))->getOpcode() ==
>> ISD::FP_ROUND)
>>
>> Is the casting to SDNode* needed? use_begin() returns a uses_iterator
>> and its operator* returns a SDNode? Seems like this can be cleaned  
>> up.
>>>
>
> Fixed.
>
> I attach the new patch. Running the dejagnu tests in llvm/test does
> not introduce any new failures.
> llvm-test test-suite is huge and takes hours to run on my machine. Due
> to this fact,  I'm still not sure if it doesn't break anything. I'd
> really appreciate if someone
> with a faster machine could run llvm-test through it and report any
> problems, if there are any.

Ok, I'll do some testing on my end. Thanks.

Evan

>
>
> -Roman
>
>> On Mar 11, 2008, at 11:00 AM, Roman Levenstein wrote:
>>
>>> Hi Chris,
>>>
>>> 2008/3/8, Chris Lattner <clattner at apple.com>:
>>>> On Mar 7, 2008, at 7:26 AM, Roman Levenstein wrote:
>>>>>> Please look at the implementation of User::setOperand() or
>>>>>> MachineOperand::setReg().  When these changes the value of an
>>>>>> operand,
>>>>>> it needs to remove the current 'instruction' from the use list of
>>>>>> the
>>>>>> old value and add it to the new value.  I assure you that it is
>>>>>> constant time, exactly because it doesn't have to scan the whole
>>>>>> list.  It just unlinks it form the middle of the list with no
>>>>>> search.
>>>>>> This makes it very fast and scalable.
>>>>>
>>>>> OK. My fault. Now I really understood how it works! ;-)
>>>>
>>>>
>>>> Ok!
>>>>
>>>>
>>>>> I tried to apply the same principle to SDNodes.
>>>>>
>>>>> This means:
>>>>> 1) Each SDNode maintains an std::list of uses, i.e. list of  
>>>>> pointers
>>>>> to SDNodes using it.
>>>>
>>>>
>>>> I don't recommend using std::list here.  The subtlety is that the  
>>>> use
>>>> list will end up winding its way through the operands that are
>>>> embedded in an SDNode.  This design allows ReplaceAllUsesWith (and
>>>> related operations) to be extremely efficient.
>>>
>>>>> Since one node can use multiple other
>>>>> nodes, each SDNode has once such link field per SDOperand. More
>>>>> precisely, I introduced the Used files, which is an array  of
>>>>> iterators pointing into Uses lists of nodes used by current  
>>>>> SDNode.
>>>>> This array is allocated dynamically and at the same places, where
>>>>> OperandList is allocated or set.
>>>>
>>>>
>>>> I think it might be best to introduce a new class, say SDUseOperand
>>>> or
>>>> something like that.  Right now the operands of an SDNode are  
>>>> stored
>>>> in the OperandList array, which is an array of SDOperands.  I'd
>>>> suggest changing this to be an array of SDUseOperand.
>>>>
>>>> SDOperand is currently a SDNode + result#.  SDUseOperand would be  
>>>> an
>>>> SDOperand (like Val in the Use class) plus a pointer to the
>>>> containing
>>>> node (like U in the Use class), and next/prev pointers (like Next/
>>>> Prev) in the use list.
>>>>
>>>> Before you worry about the size bloat of tracking all this,  
>>>> remember
>>>> that the Uses SmallVector in SDNode will turn into a single  
>>>> pointer,
>>>> so this will be only a small increase in memory if any at all.
>>>>
>>>> Selection dags have much more constrained situations for when the
>>>> operands of a node are changed, so keeping the use lists up to date
>>>> should be relatively easy (compared to llvm ir or machineoperands).
>>>
>>> OK. I implemented lists of uses  as you proposed. See the attached
>>> patch for details.
>>> I hope I understood you correctly and implemented what you meant.
>>>
>>>>> 3) The rest of the code is unaffected for now. But probably, with
>>>>> this
>>>>> approach, replaceAllUses.. can be done faster by rewriting it in a
>>>>> more intelligent way,e.g. bulk removals or moving the whole Uses
>>>>> list
>>>>> at once or something like that?
>>>>
>>>>
>>>> Yep, making that faster can be done as a second step.
>>>
>>> I haven't really looked at this yet.
>>>
>>>>> The most interesting thing, though, is that this implementation  
>>>>> has
>>>>> roughly the same performance on kimwitu (+1%), but is definitely
>>>>> worse
>>>>> on the huge  big4.bc. In this one, it is about 10% worse for some
>>>>> reason (and profiler shows suddenly that SDOperand::operator == is
>>>>> the
>>>>> hottest function, when called from SDNode::hasNUsesOfValue. I have
>>>>> no
>>>>> idea why... I repeated tests multiple times, but it always  
>>>>> produces
>>>>> the same result. With my set approach, the same function does not
>>>>> consume any time...). I guess it is somehow related to the memory
>>>>> access locality or something like that. Any good ideas  
>>>>> explaining my
>>>>> findings?
>>>>
>>>>
>>>> Sounds like something strange is happening.
>>>
>>> The SDNode::hasNUsesOfValue bottleneck mentioned above has  
>>> disappeared
>>> with this patch.
>>>
>>>> The approach looks like the right step, but you really need to
>>>> eliminate the separate std::list for it to be comparable.
>>>
>>> Done! Please review and give it some testing, if possible.
>>> On my machine, it shows 10%-15% speedup on big BBs.
>>>
>>> -Roman
>>>
>>> P.S. I still haven't got any reviews for my first patch related to  
>>> the
>>> instruction selection:
>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
>>> Could someone have a look at it?
>>
>>> <UsesList.patch>_______________________________________________
>>
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
> <UsesList1.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list