[llvm-commits] Speeding up instruction selection
Evan Cheng
evan.cheng at apple.com
Tue Mar 11 17:43:48 PDT 2008
Thanks. Some comments:
+ typedef uses_iterator use_iterator;
This seems silly, why not just change all references to use_iterator
to uses_iterator?
- SDNode(unsigned Opc, SDVTList VTs) : NodeType(Opc), NodeId(-1) {
+ SDNode(unsigned Opc, SDVTList VTs) : NodeType(Opc), NodeId(-1),
UsesSize(0), Uses(NULL) {
OperandsNeedDelete = false; // Operands set with InitOperands.
NumOperands = 0;
OperandList = 0;
-
Please be careful about 80 column violation.
And please remove code that's commented out.
+ //NodesUsed = NULL;
// no cycles in the graph.
for (SDNode::op_iterator I = N->op_begin(), E = N->op_end(); I !
= E; ++I) {
SDNode *Operand = I->Val;
- Operand->removeUser(N);
+ Operand->removeUser(std::distance(N->op_begin(), I), N);
Can you iterate from end to begin to avoid calling std::distance() for
every user?
+ if (N->OperandsNeedDelete) {
+ delete[] N->OperandList;
+ }
Please remove tab.
+ // Remember that this node is about to morph
+ if(!Users.count(U)) {
+ Users.insert(U);
+ // This node is about to morph, remove its old self from the CSE maps.
+ RemoveNodeFromCSEMaps(U);
+ }
Tab again.
- SmallSetVector<SDNode*, 16> Users(From.Val->use_begin(), From.Val-
>use_end());
+ SmallSetVector<SDNode*, 16> Users;
+ for (SDNode::use_iterator UI = From.Val->use_begin(), E = From.Val-
>use_end(); UI != E; ++UI) {
+ SDNode *User = *UI;
+ Users.insert(User);
+ }
80-col violation.
+ From.Val->removeUser(Op-User->op_begin(),User);
+ *Op = To;
+ Op->parent = User;
+ To.Val->addUser(Op-User->op_begin(),User);
Tab.
+ // TODO: Only iterate over uses of a given value of the node
+ for (SDNode::use_iterator UI = use_begin(), E = use_end(); UI != E;
++UI) {
+ if(*UI == TheValue) {
+ if(NUses == 0)
+ return false;
Tab. And please add a space between 'if' and '('.
- return cast<ConstantSDNode>(OperandList[Num])->getValue();
+ return (cast<ConstantSDNode>(OperandList[Num]))->getValue();
Why the extra parentheses?
- if (N->hasOneUse() && (*N->use_begin())->getOpcode() ==
ISD::FP_ROUND)
+ if (N->hasOneUse() && ((SDNode*)(*N->use_begin()))->getOpcode() ==
ISD::FP_ROUND)
Is the casting to SDNode* needed? use_begin() returns a uses_iterator
and its operator* returns a SDNode? Seems like this can be cleaned up.
>
Evan
On Mar 11, 2008, at 11:00 AM, Roman Levenstein wrote:
> Hi Chris,
>
> 2008/3/8, Chris Lattner <clattner at apple.com>:
>> On Mar 7, 2008, at 7:26 AM, Roman Levenstein wrote:
>>>> Please look at the implementation of User::setOperand() or
>>>> MachineOperand::setReg(). When these changes the value of an
>>>> operand,
>>>> it needs to remove the current 'instruction' from the use list of
>>>> the
>>>> old value and add it to the new value. I assure you that it is
>>>> constant time, exactly because it doesn't have to scan the whole
>>>> list. It just unlinks it form the middle of the list with no
>>>> search.
>>>> This makes it very fast and scalable.
>>>
>>> OK. My fault. Now I really understood how it works! ;-)
>>
>>
>> Ok!
>>
>>
>>> I tried to apply the same principle to SDNodes.
>>>
>>> This means:
>>> 1) Each SDNode maintains an std::list of uses, i.e. list of pointers
>>> to SDNodes using it.
>>
>>
>> I don't recommend using std::list here. The subtlety is that the use
>> list will end up winding its way through the operands that are
>> embedded in an SDNode. This design allows ReplaceAllUsesWith (and
>> related operations) to be extremely efficient.
>
>>> Since one node can use multiple other
>>> nodes, each SDNode has once such link field per SDOperand. More
>>> precisely, I introduced the Used files, which is an array of
>>> iterators pointing into Uses lists of nodes used by current SDNode.
>>> This array is allocated dynamically and at the same places, where
>>> OperandList is allocated or set.
>>
>>
>> I think it might be best to introduce a new class, say SDUseOperand
>> or
>> something like that. Right now the operands of an SDNode are stored
>> in the OperandList array, which is an array of SDOperands. I'd
>> suggest changing this to be an array of SDUseOperand.
>>
>> SDOperand is currently a SDNode + result#. SDUseOperand would be an
>> SDOperand (like Val in the Use class) plus a pointer to the
>> containing
>> node (like U in the Use class), and next/prev pointers (like Next/
>> Prev) in the use list.
>>
>> Before you worry about the size bloat of tracking all this, remember
>> that the Uses SmallVector in SDNode will turn into a single pointer,
>> so this will be only a small increase in memory if any at all.
>>
>> Selection dags have much more constrained situations for when the
>> operands of a node are changed, so keeping the use lists up to date
>> should be relatively easy (compared to llvm ir or machineoperands).
>
> OK. I implemented lists of uses as you proposed. See the attached
> patch for details.
> I hope I understood you correctly and implemented what you meant.
>
>>> 3) The rest of the code is unaffected for now. But probably, with
>>> this
>>> approach, replaceAllUses.. can be done faster by rewriting it in a
>>> more intelligent way,e.g. bulk removals or moving the whole Uses
>>> list
>>> at once or something like that?
>>
>>
>> Yep, making that faster can be done as a second step.
>
> I haven't really looked at this yet.
>
>>> The most interesting thing, though, is that this implementation has
>>> roughly the same performance on kimwitu (+1%), but is definitely
>>> worse
>>> on the huge big4.bc. In this one, it is about 10% worse for some
>>> reason (and profiler shows suddenly that SDOperand::operator == is
>>> the
>>> hottest function, when called from SDNode::hasNUsesOfValue. I have
>>> no
>>> idea why... I repeated tests multiple times, but it always produces
>>> the same result. With my set approach, the same function does not
>>> consume any time...). I guess it is somehow related to the memory
>>> access locality or something like that. Any good ideas explaining my
>>> findings?
>>
>>
>> Sounds like something strange is happening.
>
> The SDNode::hasNUsesOfValue bottleneck mentioned above has disappeared
> with this patch.
>
>> The approach looks like the right step, but you really need to
>> eliminate the separate std::list for it to be comparable.
>
> Done! Please review and give it some testing, if possible.
> On my machine, it shows 10%-15% speedup on big BBs.
>
> -Roman
>
> P.S. I still haven't got any reviews for my first patch related to the
> instruction selection:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
> Could someone have a look at it?
> <UsesList.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list