[LLVMdev] mem2reg optimization
clattner at apple.com
Sun Oct 26 23:09:29 PDT 2008
On Oct 21, 2008, at 10:16 AM, David Greene wrote:
> On Monday 20 October 2008 16:14, Chris Lattner wrote:
>> Ok, if you need this direction, just use an
>> std::vector<pair<unsigned,Instruction*>> as a forward map? You can
>> fast binary searches in it.
> I don't see how that helps. I need a map from BasicBlock to a list
> of loads
> and stores.
This isn't needed after all.
>>> I suppose we could resctrict the map to containing only load and
>>> instructions but if you've got a block with a lot of memory
>>> instructions I'm
>>> not sure it will be much faster, especially with the second loop.
>> Yes, you only want load and stores that directly target allocas.
>> won't impact the complexity of the routine but will be a nice speedup
>> in practice I suspect.
> But complexity is exactly what we have to attack. We're talking
> LARGE basic
> blocks here.
The point is to have low complexity and low constant factor.
>>> The artificial testcase I've come up with does exactly this. It
>>> does load-load-add-store-call over and over again.
>> It would be nice if you could share the artificial testcase, e.g. by
>> attaching it to a bugzilla.
> I've got it and will file the bug.
Thank you! I committed a patch that provides a ~230x speedup on that
testcase. If you run into other similar problems, please file another
>>> I've got the original patch ready to go. Should I commit that? I'm
>>> if we want to make things simpler later on we can do that and we'll
>>> have the
>>> testcase in the repository to catch whether we've simplified too
>>> much and lost
>>> the optimization.
>> I'd really rather stick with the simple approach, instead of checking
>> in the complex one and hoping someone fixes it later. Thanks Dave,
> Fundamentally, we need a map from BasicBlock to a list of ordered
> loads and
> stores in that block.
I don't see why. Please take a look at my patch, does it seem
reasonable? This is a very simple approach and seems like it works
Are you seeing cases where lots of time is spent in
RewriteSingleStoreAlloca[s]? If so, please provide a testcase and I
can apply the obvious fix.
More information about the llvm-dev