[LLVMdev] mem2reg optimization

Chris Lattner clattner at apple.com
Sun Oct 26 23:09:29 PDT 2008

On Oct 21, 2008, at 10:16 AM, David Greene wrote:

> On Monday 20 October 2008 16:14, Chris Lattner wrote:
>> Ok, if you need this direction, just use an
>> std::vector<pair<unsigned,Instruction*>> as a forward map?  You can  
>> do
>> fast binary searches in it.
> I don't see how that helps.  I need a map from BasicBlock to a list  
> of loads
> and stores.

This isn't needed after all.

>>> I suppose we could resctrict the map to containing only load and  
>>> store
>>> instructions but if you've got a block with a lot of memory
>>> instructions I'm
>>> not sure it will be much faster, especially with the second loop.
>> Yes, you only want load and stores that directly target allocas.   
>> This
>> won't impact the complexity of the routine but will be a nice speedup
>> in practice I suspect.
> But complexity is exactly what we have to attack.  We're talking  
> LARGE basic
> blocks here.

The point is to have low complexity and low constant factor.

>>> The artificial testcase I've come up with does exactly this.  It
>>> basically
>>> does load-load-add-store-call over and over again.
>> It would be nice if you could share the artificial testcase, e.g. by
>> attaching it to a bugzilla.
> I've got it and will file the bug.

Thank you!  I committed a patch that provides a ~230x speedup on that  
testcase.  If you run into other similar problems, please file another  

>>> I've got the original patch ready to go.  Should I commit that?  I'm
>>> thinking
>>> if we want to make things simpler later on we can do that and we'll
>>> have the
>>> testcase in the repository to catch whether we've simplified too
>>> much and lost
>>> the optimization.
>> I'd really rather stick with the simple approach, instead of checking
>> in the complex one and hoping someone fixes it later.  Thanks Dave,
> Fundamentally, we need a map from BasicBlock to a list of ordered  
> loads and
> stores in that block.

I don't see why.  Please take a look at my patch, does it seem  
reasonable?  This is a very simple approach and seems like it works  
just fine:

Are you seeing cases where lots of time is spent in  
RewriteSingleStoreAlloca[s]?  If so, please provide a testcase and I  
can apply the obvious fix.


More information about the llvm-dev mailing list