[llvm-commits] Speeding up instruction selection

Fri Mar 7 23:53:47 PST 2008

On Mar 7, 2008, at 7:26 AM, Roman Levenstein wrote:
>> Please look at the implementation of User::setOperand() or
>> MachineOperand::setReg().  When these changes the value of an  
>> operand,
>> it needs to remove the current 'instruction' from the use list of the
>> old value and add it to the new value.  I assure you that it is
>> constant time, exactly because it doesn't have to scan the whole
>> list.  It just unlinks it form the middle of the list with no search.
>> This makes it very fast and scalable.
>
> OK. My fault. Now I really understood how it works! ;-)

Ok!

> I tried to apply the same principle to SDNodes.
>
> This means:
> 1) Each SDNode maintains an std::list of uses, i.e. list of pointers
> to SDNodes using it.

I don't recommend using std::list here.  The subtlety is that the use  
list will end up winding its way through the operands that are  
embedded in an SDNode.  This design allows ReplaceAllUsesWith (and  
related operations) to be extremely efficient.

> 2) Each SDNode has a field that links it into the Uses list of SDNode
> used by this node. This link is an iterator pointing to the position
> inside the SDNode::Uses list.

Right, this should be the same list as above.

> Since one node can use multiple other
> nodes, each SDNode has once such link field per SDOperand. More
> precisely, I introduced the Used files, which is an array  of
> iterators pointing into Uses lists of nodes used by current SDNode.
> This array is allocated dynamically and at the same places, where
> OperandList is allocated or set.

I think it might be best to introduce a new class, say SDUseOperand or  
something like that.  Right now the operands of an SDNode are stored  
in the OperandList array, which is an array of SDOperands.  I'd  
suggest changing this to be an array of SDUseOperand.

SDOperand is currently a SDNode + result#.  SDUseOperand would be an  
SDOperand (like Val in the Use class) plus a pointer to the containing  
node (like U in the Use class), and next/prev pointers (like Next/ 
Prev) in the use list.

Before you worry about the size bloat of tracking all this, remember  
that the Uses SmallVector in SDNode will turn into a single pointer,  
so this will be only a small increase in memory if any at all.

Selection dags have much more constrained situations for when the  
operands of a node are changed, so keeping the use lists up to date  
should be relatively easy (compared to llvm ir or machineoperands).

> 3) The rest of the code is unaffected for now. But probably, with this
> approach, replaceAllUses.. can be done faster by rewriting it in a
> more intelligent way,e.g. bulk removals or moving the whole Uses list
> at once or something like that?

Yep, making that faster can be done as a second step.

> With these changes, Insertion become fast, removal is also very fast,
> both are constant time, as you said.
> Chris, would you say it is the approach that you had in mind?

Yep!

> I tried to test on kimwitu, as you suggested, by compiling the big
> kw.llc.bc file.
> First of all, according to profiler, there are no real bottlenecks,
> consuming most of the time.

Right.

>
> The most interesting thing, though, is that this implementation has
> roughly the same performance on kimwitu (+1%), but is definitely worse
> on the huge  big4.bc. In this one, it is about 10% worse for some
> reason (and profiler shows suddenly that SDOperand::operator == is the
> hottest function, when called from SDNode::hasNUsesOfValue. I have no
> idea why... I repeated tests multiple times, but it always produces
> the same result. With my set approach, the same function does not
> consume any time...). I guess it is somehow related to the memory
> access locality or something like that. Any good ideas explaining my
> findings?

Sounds like something strange is happening.

> I attach the patch, just for the sake of completeness. It is really
> ugly and has a lot of conditional compilation, but gives you the idea.

The approach looks like the right step, but you really need to  
eliminate the separate std::list for it to be comparable.

-Chris