[llvm-commits] Speeding up instruction selection

Fri Mar 7 07:26:50 PST 2008

Hi Chris,

2008/3/7, Chris Lattner <clattner at apple.com>:
> On Mar 6, 2008, at 11:10 PM, Roman Levenstein wrote:
>  >>> OK. But how does it help with removal operations, which is done by
>  >>> removeUser() function called from ReplaceAllUsesOfValueWith ? In
>  >>> case
>  >>> of big MBBs, these lists would contain thousends of uses, The
>  >>> big4.bc
>  >>> contains for example 8000 elements in such a list. So, linear search
>  >>> would be a disaster here. A sorted list is probably a better
>  >>> alternative, but may be still slow... So, sets or hash-tables are
>  >>> better candidates. Or do I miss something?
>  >>
>  >>
>  >> The problem with removeUser is that it is O(n) in the number of users
>  >> of a value.  Changing this operation to be O(1) [at with a very low
>  >> constant factor] makes it size invariant and much faster in practice.
>  >
>  > I think, we still misunderstand each other. If you speak about
>  > removing the element from the middle, then it is true that lists do
>  > not do any moving of elements around and just update pointers, so they
>  > are very fast in that. But before you can start removing, you should
>  > first find the element to remove inside this list. removeUser is
>  > called in my case for 8000 different keys and each time it tries to
>  > find the element and then to remove it. So, as far as I can see, it is
>  > an O(n^2) time. Therefore, search time becomes a dominant factor, And
>  > this is why a data structure with a possibility of a very fast search
>  > is required.
>
>
> Please look at the implementation of User::setOperand() or
>  MachineOperand::setReg().  When these changes the value of an operand,
>  it needs to remove the current 'instruction' from the use list of the
>  old value and add it to the new value.  I assure you that it is
>  constant time, exactly because it doesn't have to scan the whole
>  list.  It just unlinks it form the middle of the list with no search.
>  This makes it very fast and scalable.

OK. My fault. Now I really understood how it works! ;-)

I tried to apply the same principle to SDNodes.

This means:
1) Each SDNode maintains an std::list of uses, i.e. list of pointers
to SDNodes using it.
2) Each SDNode has a field that links it into the Uses list of SDNode
used by this node. This link is an iterator pointing to the position
inside the SDNode::Uses list. Since one node can use multiple other
nodes, each SDNode has once such link field per SDOperand. More
precisely, I introduced the Used files, which is an array  of
iterators pointing into Uses lists of nodes used by current SDNode.
This array is allocated dynamically and at the same places, where
OperandList is allocated or set.
3) The rest of the code is unaffected for now. But probably, with this
approach, replaceAllUses.. can be done faster by rewriting it in a
more intelligent way,e.g. bulk removals or moving the whole Uses list
at once or something like that?

With these changes, Insertion become fast, removal is also very fast,
both are constant time, as you said.

Chris, would you say it is the approach that you had in mind?

I tried to test on kimwitu, as you suggested, by compiling the big
kw.llc.bc file.
First of all, according to profiler, there are no real bottlenecks,
consuming most of the time.

The most interesting thing, though, is that this implementation has
roughly the same performance on kimwitu (+1%), but is definitely worse
on the huge  big4.bc. In this one, it is about 10% worse for some
reason (and profiler shows suddenly that SDOperand::operator == is the
hottest function, when called from SDNode::hasNUsesOfValue. I have no
idea why... I repeated tests multiple times, but it always produces
the same result. With my set approach, the same function does not
consume any time...). I guess it is somehow related to the memory
access locality or something like that. Any good ideas explaining my
findings?

I attach the patch, just for the sake of completeness. It is really
ugly and has a lot of conditional compilation, but gives you the idea.

Also, having implemented both approaches, I can say that in my opinion
the approach proposed by you is more intrusive in terms of source code
modifications and it changes the SDNodes objects structure too much. I
mean, to implement such an approach you put some "quick links" into
your objects and you affect both Uses and Values. If your value object
is used in many different contexts (e.g. in many different lists), you
would introduce into the object more and more links to all those
contexts. This increases the complexity and introduces a rather tight
coupling between the data structures, which could make later code
modifications more difficult.

Any comments are welcome.

-Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SelectionDAG.patch
Type: text/x-patch
Size: 16213 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20080307/e4846f20/attachment.bin>