[llvm-commits] [cfe-commits] [PATCH][Review Request] EarlyCSE stack overflow - bugzilla 11794

Mon Jan 23 14:30:55 PST 2012

On Jan 20, 2012, at 3:08 PM, Jakob Stoklund Olesen wrote:

> On Jan 20, 2012, at 1:41 PM, Lenny Maiorani wrote:
>> There is actually a real difference here. The difference is that when the std::stack or std::deque is being used, it is necessary to have 2 stacks or deques because there is no reverse iterator on DomTreeNode. I am trying to preserve the order of processing nodes so that there is not some case where the behavior changes. If the order does not need to be preserved, then clearly it can be further optimized.
> 
> In this case, the child iteration order probably isn't important, but even if it were, you could still avoid the overhead with a vector like this:
> 
>  vec.resize(vec.size() + num_children)
>  std::copy(child_begin, child_end, vec.rbegin())
> 
> Anyway, the runtime differences you were seeing in your stack-test-1.cpp benchmark were probably dominated by the number of times you implicitly call free() in the inner loop.
> 
> Note that DepthFirstIterator.h achieves the same thing using much less memory, and it has the stack that you need anyway in order to:
> 
>> I still need to take a look at Jakob's other review comment.
> 
> Thanks for working on this!
> 
> /jakob

Jakob,

Thank you for the information about MachineCSE and noticing the hash tables which are being generated at the top of EarlyCSE::processNode(). My latest implementation contains roughly the same implementation as MachineCSE. MachineCSE generates stack of nodes to process using SmallVector. Since this data structure is so large, I think using SmallVector is not correct.

Also, I implemented the solution using vec.resize() and std::copy(), but this doesn't significantly improve the performance of the vector solution. My numbers are not at all surprising to me. Vectors do not do as well as deques with many pushes and pops because deques manage their memory in slabs to prevent needing to do large reallocs and hence large copies on occasion.

Using the DepthFirstIterator (df_iterator) would be nice, but I do not see an easy way to maintain the CurrentGeneration variable which gets modified on a per-tree-depth. This means that there would still need to be some sort of stack kept separately from the iterator. This would be confusing and clutter the code.

I looked a bit at the hash tables at the top of EarlyCSE::processNode() and I do not think they are nested. It looks like they are copy-constructed from some hash tables which are class members. These are then modified and provided the order of node traversal does not change, then the data in those member hash tables should be the same.

The std::deque implementation is still the one I think is best. It is quite possible I am missing a subtlety of the copy-constructor or something else. How say you?

-Lenny