[LLVMdev] mem2reg optimization

Mon Oct 20 10:04:32 PDT 2008

On Monday 06 October 2008 11:47, David Greene wrote:

> > As far as the approach goes, here's a counterproposal that is a bit
> > less complex.  I suggest adding a single DenseMap<Instruction*,
> > unsigned> to Mem2reg.  If there is an entry in this map for a load/
> > store instruction, it would store the instruction's index in its basic
> > block.  This means that to check dominance of two instrs in a block,
> > you just need to compare their two indices (which is fast).  Really
> > small blocks (< 10 instructions?) would never be entered into map, to
> > keep the size of the map low.  The first time a query is made within a
> > large block, it would compute the indices of all loads and stores
> > within the block.
>
> This is a simpler approach and should have the same effect.  I'll code it
> up.

Urk.  After looking at doing this today, I realized it won't work.  The 
problem is that RewriteSingleStoreAlloca has this loop:

      for (; &*I != OnlyStore; ++I) { // scan block for store.
        if (isa<LoadInst>(I) && I->getOperand(0) == AI)
          break;

It's looking for a load  that matches the address of the single store.  This 
is the loop that's taking all the time.  What I did with the original patch 
was to make lookups of load and store instructions fast.  That won't be 
possible with a map indexing all instructions.

This loop is also a problem:

    // Otherwise, if this is a different block or if all uses happen
    // after the store, do a simple linear scan to replace loads with
    // the stored value.
    for (BasicBlock::iterator I = UseBlock->begin(), E = UseBlock->end();
         I != E; ) {
      if (LoadInst *LI = dyn_cast<LoadInst>(I++)) {
        if (LI->getOperand(0) == AI) {
          LI->replaceAllUsesWith(OnlyStore->getOperand(0));
          if (AST && isa<PointerType>(LI->getType()))
            AST->deleteValue(LI);
          LI->eraseFromParent();
        }
      }
    }

I suppose we could resctrict the map to containing only load and store 
instructions but if you've got a block with a lot of memory instructions I'm 
not sure it will be much faster, especially with the second loop.

The artificial testcase I've come up with does exactly this.  It basically 
does load-load-add-store-call over and over again.

I've got the original patch ready to go.  Shoudl I commit that?  I'm thinking 
if we want to make things simpler later on we can do that and we'll have the 
testcase in the repository to catch whether we've simplified too much and lost 
the optimization.

                                                   -Dave