[LLVMdev] FoldingSetNodeID operations inefficiency

Wed Apr 23 23:33:25 PDT 2008

Hi Chris,

This is a good idea and I started thinking in that direction already.
But what I don't quite understand the TFs, how TFs are formed and which rules they should obey to.

For example now: 
> PendingLoads created by the SelectionDAGLowering::getLoadFrom and then copied into the
> TokenFactor node by SelectionDAGLowering::getRoot called from the
> SelectionDAGLowering::visitStore

So, if I now detect in the getRoot that there are more than let's say 64 nodes in the  PendingLoads queue, what should I do?
 - Can I take only 64 of them and forget about the rest by ignoring them? 
 - Or should I produces a series of TFs connected to each other, each with max 64 operands? If so, how do I do it?
Basically, I need a better understanding of TFs and how they relate to each other.

Another question: My small experiments give me the impression that TF nodes in my use-cases are never found in the SelectionDAG's CSE map by FindOrInsert, even though they are inserted into it. That is, every time a new node is inserted into the map. If this is true, may be their insertion into CSE maps can be avoided alltogether? But may be I'm wrong...

- Roman

----- Ursprüngliche Mail ----
Von: Chris Lattner <sabre at nondot.org>
An: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Gesendet: Donnerstag, den 24. April 2008, 02:07:55 Uhr
Betreff: Re: [LLVMdev] FoldingSetNodeID operations inefficiency

On Wed, 23 Apr 2008, Roman Levenstein wrote:
> Hi,
>
> While profiling LLVM using my test-cases with huge MBBs, I noticed that
> FoldingSetNodeID operations (ComputeHash,insertion,etc) may become
> really inefficient for the nodes, which have very many operands.

Here is a stupid idea, which is simple and probably really effective:

This only affects you because you have 3200 operands to your TF node. 
This only affects TF because TF can take an arbitrary number of operands, 
and we fold TFs together arbitrarily.

What if you changed the code generator to only merge TF nodes if they have 
less than some threshold (64?) of operands?  If you avoided making totally 
crazy nodes, the problem would be defined away.  Could this work?

-Chris

> test-case (you can fetch it from here
> http://llvm.org/bugs/attachment.cgi?id=1275), which is just one HUGE MBB
> with tens of thousends of instructions, there is a TokenFactor node N
> that has ... 3200 operands! (These operands are PendingLoads created by
> the SelectionDAGLowering::getLoadFrom and then copied into the
> TokenFactor node by SelectionDAGLowering::getRoot called from the
> SelectionDAGLowering::visitStore).
>
> During the code selection phase, the SelectionDAG::ReplaceAllUsesWith is
> called many times and in many cases it tried to update this node, by
> replacing some of its 3200 operands by a different operand. After each
> such operand replacement, this snippen of code is typically executed
> (see SelectionDAG.cpp):
>
> // Now that we have modified U, add it back to the CSE maps.
> // If it already exists there, recursively merge the results together.
>  if (SDNode *Existing = AddNonLeafNodeToCSEMaps(U)) {
>      ReplaceAllUsesWith(U, Existing, UpdateListener);
>       // U is now dead.  Inform the listener if it exists and delete it.
>       if (UpdateListener)
>         UpdateListener->NodeDeleted(U);
>       DeleteNodeNotInCSEMaps(U);
>   } else {
> // If the node doesn't already exist, we updated it.
> // Inform a listener if it exists.
>       if (UpdateListener)
>         UpdateListener->NodeUpdated(U);
>  }
>
> So, basically CSE map should be updated. To achieve that, a hash of the
> updated node N is computed (and its internal FoldingSetNodeID::Bits
> array contains about 6400 elements!) and the node is replaced.
>
> My observation shows that amount of FoldingSetNodeID operations on this
> node N is almost quadratic. It manifests itself by consuming about 20%
> of the compilation time only for these operations.
>
> Question:
> Does anyone see an opportunity to solve this issue?
> My current ideas are:
>  - avoid recomputation of the hash values for nodes that were not
> changed since the hash was computed last time
>  - do some sort of lazy CSE maps update. Accumulate operand changes for
> a given node and then do only one update. But I don't see yet, how it
> can be done practically
>  - Optimize the SelectionDAG::ReplaceAllUsesWith to do bulk updates,
> i.e. updating many user nodes of the given node at once or updating all
> operands of the given node at once. Again, I  don't see how to do it
> practically.
>  - May be the issue is related to the strange TokenFactor node with the
> 3200 operands. May be creation of the node with so many operands can be
> avoided?
>
> Any ideas are appriciated!
>
> -Roman
>
>
>
>
>
>
>      Lesen Sie Ihre E-Mails jetzt einfach von unterwegs.
> www.yahoo.de/go
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

      __________________________________________________________
Gesendet von Yahoo! Mail.
Mehr Möglichkeiten, in Kontakt zu bleiben. http://de.overview.mail.yahoo.com