[llvm-commits] Speeding up instruction selection

Wed Mar 19 04:12:56 PDT 2008

Hi Evan,

2008/3/12, Evan Cheng <evan.cheng at apple.com>:
> Thanks. Some comments:

Thanks for your comments. All tabs violations are fixed. All "if("
without spaces are fixed.

>  +  typedef uses_iterator use_iterator;
>  This seems silly, why not just change all references to use_iterator
>  to uses_iterator?

Done.

>  +    //NodesUsed = NULL;
>
>       // no cycles in the graph.
>       for (SDNode::op_iterator I = N->op_begin(), E = N->op_end(); I !
>  = E; ++I) {
>         SDNode *Operand = I->Val;
>  -      Operand->removeUser(N);
>  +      Operand->removeUser(std::distance(N->op_begin(), I), N);
>
>  Can you iterate from end to begin to avoid calling std::distance() for
>  every user?

Well, I'm afraid it is the only reliable way to compute the index,
because it does not depend on the knowledge about how operands of
SDNode are stored.  The old code was assuming the SDOperand[], which
is not exactly the case now and will probably change in the future.
BTW, I thought that std::distance() is actually very cheap and is
probably expanded into a subtraction. Am I wrong?

>  -  return cast<ConstantSDNode>(OperandList[Num])->getValue();
>  +  return (cast<ConstantSDNode>(OperandList[Num]))->getValue();
>
>  Why the extra parentheses?

Fixed.

>  -  if (N->hasOneUse() && (*N->use_begin())->getOpcode() ==
>  ISD::FP_ROUND)
>  +  if (N->hasOneUse() && ((SDNode*)(*N->use_begin()))->getOpcode() ==
>  ISD::FP_ROUND)
>
>  Is the casting to SDNode* needed? use_begin() returns a uses_iterator
>  and its operator* returns a SDNode? Seems like this can be cleaned up.
>  >

Fixed.

I attach the new patch. Running the dejagnu tests in llvm/test does
not introduce any new failures.
llvm-test test-suite is huge and takes hours to run on my machine. Due
to this fact,  I'm still not sure if it doesn't break anything. I'd
really appreciate if someone
with a faster machine could run llvm-test through it and report any
problems, if there are any.

-Roman

>  On Mar 11, 2008, at 11:00 AM, Roman Levenstein wrote:
>
>  > Hi Chris,
>  >
>  > 2008/3/8, Chris Lattner <clattner at apple.com>:
>  >> On Mar 7, 2008, at 7:26 AM, Roman Levenstein wrote:
>  >>>> Please look at the implementation of User::setOperand() or
>  >>>> MachineOperand::setReg().  When these changes the value of an
>  >>>> operand,
>  >>>> it needs to remove the current 'instruction' from the use list of
>  >>>> the
>  >>>> old value and add it to the new value.  I assure you that it is
>  >>>> constant time, exactly because it doesn't have to scan the whole
>  >>>> list.  It just unlinks it form the middle of the list with no
>  >>>> search.
>  >>>> This makes it very fast and scalable.
>  >>>
>  >>> OK. My fault. Now I really understood how it works! ;-)
>  >>
>  >>
>  >> Ok!
>  >>
>  >>
>  >>> I tried to apply the same principle to SDNodes.
>  >>>
>  >>> This means:
>  >>> 1) Each SDNode maintains an std::list of uses, i.e. list of pointers
>  >>> to SDNodes using it.
>  >>
>  >>
>  >> I don't recommend using std::list here.  The subtlety is that the use
>  >> list will end up winding its way through the operands that are
>  >> embedded in an SDNode.  This design allows ReplaceAllUsesWith (and
>  >> related operations) to be extremely efficient.
>  >
>  >>> Since one node can use multiple other
>  >>> nodes, each SDNode has once such link field per SDOperand. More
>  >>> precisely, I introduced the Used files, which is an array  of
>  >>> iterators pointing into Uses lists of nodes used by current SDNode.
>  >>> This array is allocated dynamically and at the same places, where
>  >>> OperandList is allocated or set.
>  >>
>  >>
>  >> I think it might be best to introduce a new class, say SDUseOperand
>  >> or
>  >> something like that.  Right now the operands of an SDNode are stored
>  >> in the OperandList array, which is an array of SDOperands.  I'd
>  >> suggest changing this to be an array of SDUseOperand.
>  >>
>  >> SDOperand is currently a SDNode + result#.  SDUseOperand would be an
>  >> SDOperand (like Val in the Use class) plus a pointer to the
>  >> containing
>  >> node (like U in the Use class), and next/prev pointers (like Next/
>  >> Prev) in the use list.
>  >>
>  >> Before you worry about the size bloat of tracking all this, remember
>  >> that the Uses SmallVector in SDNode will turn into a single pointer,
>  >> so this will be only a small increase in memory if any at all.
>  >>
>  >> Selection dags have much more constrained situations for when the
>  >> operands of a node are changed, so keeping the use lists up to date
>  >> should be relatively easy (compared to llvm ir or machineoperands).
>  >
>  > OK. I implemented lists of uses  as you proposed. See the attached
>  > patch for details.
>  > I hope I understood you correctly and implemented what you meant.
>  >
>  >>> 3) The rest of the code is unaffected for now. But probably, with
>  >>> this
>  >>> approach, replaceAllUses.. can be done faster by rewriting it in a
>  >>> more intelligent way,e.g. bulk removals or moving the whole Uses
>  >>> list
>  >>> at once or something like that?
>  >>
>  >>
>  >> Yep, making that faster can be done as a second step.
>  >
>  > I haven't really looked at this yet.
>  >
>  >>> The most interesting thing, though, is that this implementation has
>  >>> roughly the same performance on kimwitu (+1%), but is definitely
>  >>> worse
>  >>> on the huge  big4.bc. In this one, it is about 10% worse for some
>  >>> reason (and profiler shows suddenly that SDOperand::operator == is
>  >>> the
>  >>> hottest function, when called from SDNode::hasNUsesOfValue. I have
>  >>> no
>  >>> idea why... I repeated tests multiple times, but it always produces
>  >>> the same result. With my set approach, the same function does not
>  >>> consume any time...). I guess it is somehow related to the memory
>  >>> access locality or something like that. Any good ideas explaining my
>  >>> findings?
>  >>
>  >>
>  >> Sounds like something strange is happening.
>  >
>  > The SDNode::hasNUsesOfValue bottleneck mentioned above has disappeared
>  > with this patch.
>  >
>  >> The approach looks like the right step, but you really need to
>  >> eliminate the separate std::list for it to be comparable.
>  >
>  > Done! Please review and give it some testing, if possible.
>  > On my machine, it shows 10%-15% speedup on big BBs.
>  >
>  > -Roman
>  >
>  > P.S. I still haven't got any reviews for my first patch related to the
>  > instruction selection:
>  > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059215.html
>  > Could someone have a look at it?
>
> > <UsesList.patch>_______________________________________________
>
> > llvm-commits mailing list
>  > llvm-commits at cs.uiuc.edu
>
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>  _______________________________________________
>  llvm-commits mailing list
>  llvm-commits at cs.uiuc.edu
>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UsesList1.patch
Type: text/x-patch
Size: 27817 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20080319/a95144e3/attachment.bin>