[LLVMdev] mem2reg optimization

Mon Oct 20 14:14:48 PDT 2008

On Oct 20, 2008, at 10:04 AM, David Greene wrote:

> On Monday 06 October 2008 11:47, David Greene wrote:
>
>>> As far as the approach goes, here's a counterproposal that is a bit
>>> less complex.  I suggest adding a single DenseMap<Instruction*,
>>> unsigned> to Mem2reg.  If there is an entry in this map for a load/
>>> store instruction, it would store the instruction's index in its  
>>> basic
>>> block.  This means that to check dominance of two instrs in a block,
>>> you just need to compare their two indices (which is fast).  Really
>>> small blocks (< 10 instructions?) would never be entered into map,  
>>> to
>>> keep the size of the map low.  The first time a query is made  
>>> within a
>>> large block, it would compute the indices of all loads and stores
>>> within the block.
>>
>> This is a simpler approach and should have the same effect.  I'll  
>> code it
>> up.
>
> Urk.  After looking at doing this today, I realized it won't work.   
> The
> problem is that RewriteSingleStoreAlloca has this loop:
>
>      for (; &*I != OnlyStore; ++I) { // scan block for store.
>        if (isa<LoadInst>(I) && I->getOperand(0) == AI)
>          break;
>
> It's looking for a load  that matches the address of the single  
> store.  This
> is the loop that's taking all the time.  What I did with the  
> original patch
> was to make lookups of load and store instructions fast.  That won't  
> be
> possible with a map indexing all instructions.

Ok, if you need this direction, just use an  
std::vector<pair<unsigned,Instruction*>> as a forward map?  You can do  
fast binary searches in it.

> I suppose we could resctrict the map to containing only load and store
> instructions but if you've got a block with a lot of memory  
> instructions I'm
> not sure it will be much faster, especially with the second loop.

Yes, you only want load and stores that directly target allocas.  This  
won't impact the complexity of the routine but will be a nice speedup  
in practice I suspect.

> The artificial testcase I've come up with does exactly this.  It  
> basically
> does load-load-add-store-call over and over again.

It would be nice if you could share the artificial testcase, e.g. by  
attaching it to a bugzilla.

> I've got the original patch ready to go.  Shoudl I commit that?  I'm  
> thinking
> if we want to make things simpler later on we can do that and we'll  
> have the
> testcase in the repository to catch whether we've simplified too  
> much and lost
> the optimization.

I'd really rather stick with the simple approach, instead of checking  
in the complex one and hoping someone fixes it later.  Thanks Dave,

-Chris