[llvm-commits] Speeding up instruction selection

Tue Mar 11 17:43:48 PDT 2008

Thanks. Some comments:

+  typedef uses_iterator use_iterator;
This seems silly, why not just change all references to use_iterator  
to uses_iterator?

-  SDNode(unsigned Opc, SDVTList VTs) : NodeType(Opc), NodeId(-1) {
+  SDNode(unsigned Opc, SDVTList VTs) : NodeType(Opc), NodeId(-1),  
UsesSize(0), Uses(NULL) {
      OperandsNeedDelete = false;  // Operands set with InitOperands.
      NumOperands = 0;
      OperandList = 0;
-
Please be careful about 80 column violation.

And please remove code that's commented out.
+    //NodesUsed = NULL;

      // no cycles in the graph.
      for (SDNode::op_iterator I = N->op_begin(), E = N->op_end(); I ! 
= E; ++I) {
        SDNode *Operand = I->Val;
-      Operand->removeUser(N);
+      Operand->removeUser(std::distance(N->op_begin(), I), N);

Can you iterate from end to begin to avoid calling std::distance() for  
every user?

+    if (N->OperandsNeedDelete) {
+	delete[] N->OperandList;
+    }

Please remove tab.

+    // Remember that this node is about to morph
+    if(!Users.count(U)) {
+	Users.insert(U);
+	// This node is about to morph, remove its old self from the CSE maps.
+	RemoveNodeFromCSEMaps(U);
+    }

Tab again.

-  SmallSetVector<SDNode*, 16> Users(From.Val->use_begin(), From.Val- 
 >use_end());
+  SmallSetVector<SDNode*, 16> Users;
+  for (SDNode::use_iterator UI = From.Val->use_begin(), E = From.Val- 
 >use_end(); UI != E; ++UI) {
+      SDNode *User = *UI;
+      Users.insert(User);
+  }

80-col violation.

+        From.Val->removeUser(Op-User->op_begin(),User);
+	*Op = To;
+        Op->parent = User;
+        To.Val->addUser(Op-User->op_begin(),User);

Tab.

+  // TODO: Only iterate over uses of a given value of the node
+  for (SDNode::use_iterator UI = use_begin(), E = use_end(); UI != E;  
++UI) {
+      if(*UI == TheValue) {
+	  if(NUses == 0)
+	      return false;

Tab. And please add a space between 'if' and '('.

-  return cast<ConstantSDNode>(OperandList[Num])->getValue();
+  return (cast<ConstantSDNode>(OperandList[Num]))->getValue();

Why the extra parentheses?

-  if (N->hasOneUse() && (*N->use_begin())->getOpcode() ==  
ISD::FP_ROUND)
+  if (N->hasOneUse() && ((SDNode*)(*N->use_begin()))->getOpcode() ==  
ISD::FP_ROUND)

Is the casting to SDNode* needed? use_begin() returns a uses_iterator  
and its operator* returns a SDNode? Seems like this can be cleaned up.
>

Evan

On Mar 11, 2008, at 11:00 AM, Roman Levenstein wrote:

> Hi Chris,
>
> 2008/3/8, Chris Lattner <clattner at apple.com>:
>> On Mar 7, 2008, at 7:26 AM, Roman Levenstein wrote:
>>>> Please look at the implementation of User::setOperand() or
>>>> MachineOperand::setReg().  When these changes the value of an
>>>> operand,
>>>> it needs to remove the current 'instruction' from the use list of  
>>>> the
>>>> old value and add it to the new value.  I assure you that it is
>>>> constant time, exactly because it doesn't have to scan the whole
>>>> list.  It just unlinks it form the middle of the list with no  
>>>> search.
>>>> This makes it very fast and scalable.
>>>
>>> OK. My fault. Now I really understood how it works! ;-)
>>
>>
>> Ok!
>>
>>
>>> I tried to apply the same principle to SDNodes.
>>>
>>> This means:
>>> 1) Each SDNode maintains an std::list of uses, i.e. list of pointers
>>> to SDNodes using it.
>>
>>
>> I don't recommend using std::list here.  The subtlety is that the use
>> list will end up winding its way through the operands that are
>> embedded in an SDNode.  This design allows ReplaceAllUsesWith (and
>> related operations) to be extremely efficient.
>
>>> Since one node can use multiple other
>>> nodes, each SDNode has once such link field per SDOperand. More
>>> precisely, I introduced the Used files, which is an array  of
>>> iterators pointing into Uses lists of nodes used by current SDNode.
>>> This array is allocated dynamically and at the same places, where
>>> OperandList is allocated or set.
>>
>>
>> I think it might be best to introduce a new class, say SDUseOperand  
>> or
>> something like that.  Right now the operands of an SDNode are stored
>> in the OperandList array, which is an array of SDOperands.  I'd
>> suggest changing this to be an array of SDUseOperand.
>>
>> SDOperand is currently a SDNode + result#.  SDUseOperand would be an
>> SDOperand (like Val in the Use class) plus a pointer to the  
>> containing
>> node (like U in the Use class), and next/prev pointers (like Next/
>> Prev) in the use list.
>>
>> Before you worry about the size bloat of tracking all this, remember
>> that the Uses SmallVector in SDNode will turn into a single pointer,
>> so this will be only a small increase in memory if any at all.
>>
>> Selection dags have much more constrained situations for when the
>> operands of a node are changed, so keeping the use lists up to date
>> should be relatively easy (compared to llvm ir or machineoperands).
>
> OK. I implemented lists of uses  as you proposed. See the attached
> patch for details.
> I hope I understood you correctly and implemented what you meant.
>
>>> 3) The rest of the code is unaffected for now. But probably, with  
>>> this
>>> approach, replaceAllUses.. can be done faster by rewriting it in a
>>> more intelligent way,e.g. bulk removals or moving the whole Uses  
>>> list
>>> at once or something like that?
>>
>>
>> Yep, making that faster can be done as a second step.
>
> I haven't really looked at this yet.
>
>>> The most interesting thing, though, is that this implementation has
>>> roughly the same performance on kimwitu (+1%), but is definitely  
>>> worse
>>> on the huge  big4.bc. In this one, it is about 10% worse for some
>>> reason (and profiler shows suddenly that SDOperand::operator == is  
>>> the
>>> hottest function, when called from SDNode::hasNUsesOfValue. I have  
>>> no
>>> idea why... I repeated tests multiple times, but it always produces
>>> the same result. With my set approach, the same function does not
>>> consume any time...). I guess it is somehow related to the memory
>>> access locality or something like that. Any good ideas explaining my
>>> findings?
>>
>>
>> Sounds like something strange is happening.
>
> The SDNode::hasNUsesOfValue bottleneck mentioned above has disappeared
> with this patch.
>
>> The approach looks like the right step, but you really need to
>> eliminate the separate std::list for it to be comparable.
>
> Done! Please review and give it some testing, if possible.
> On my machine, it shows 10%-15% speedup on big BBs.
>
> -Roman
>
> P.S. I still haven't got any reviews for my first patch related to the
> instruction selection:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
> Could someone have a look at it?
> <UsesList.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits