[PATCH] Update MergedLoadStoreMotion to use MemorySSA

Thu Apr 9 10:01:22 PDT 2015

On Thu, Apr 9, 2015 at 9:10 AM, Philip Reames <listmail at philipreames.com> wrote:
> FYI, I'm basically reviewing this as a new algorithm.  Trying to match it up with what's there before doesn't seem particularly worth doing given the algorithmic problems you commented on.
>
>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:106
> @@ +105,3 @@
> +
> +struct StoreInstHash {
> +  size_t operator()(const StoreInst *A) const {
> ----------------
> Can you use lambdas for these?  I'm not entirely sure that works, but if it does, the code would be cleaner.
>
Not in a way that's cleaner, that i'm aware of.  If someone on the
list is a C++ expert and has a way, let's do it, because I agree.

(otherwise, http://stackoverflow.com/questions/15719084/how-to-use-lambda-function-as-hash-function-in-unordered-map)

> Also, I'm surprised that a generic instruction hash function doesn't already exist.  What does GVN use for this?

GVN invents its own expressions (as does newgvn), and has a hash
function for that:

     friend hash_code hash_value(const Expression &Value) {
       return hash_combine(Value.opcode, Value.type,
                           hash_combine_range(Value.varargs.begin(),
                                              Value.varargs.end()));

It's kind of needed for other reasons.

EarlyCSE does something similar, look for " unsigned
DenseMapInfo<SimpleValue>::getHashValue(SimpleValue Val)" in
earlycse.cpp.

You'll see it's complicated depending on what you want (for example,
newgvn wants loads and stores to hash to the same thing when they
produce the same value, but that means you have to ignore the opcode,
etc)  , which is why i imagine no generic instruction hashing exists
:)

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:117
> @@ +116,3 @@
> +  bool operator()(const StoreInst *A, const StoreInst *B) const {
> +    return (A == B || A->isSameOperationAs(B));
> +  }
> ----------------
> Is the extra A == B check needed?  I would expect that to be inside isSameOperationAs.

So, i made this identical to what was there before, but looking as
isSameOperationAs, it does not, in fact, shortcut check if A==B.
Happy to add it there instead.

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:308
> @@ -311,1 +307,3 @@
>
> +  // We also hoist operands, so see if this is even a memory access we want to
> +  // fix up
> ----------------
> Not quite clear what you're trying to say with this comment?  The code appears to just be updating MemorySSA for the transform performed.

This function is called on non-memory operands as well, since it will
hoist the operands if it has to.  So we need to check whether what we
are being asked to update is a memory access before trying to use the
API on it :)

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:410
> @@ +409,3 @@
> +    // We know that if the load has the same clobbering access as this one,,
> +    // they must not be killed until the same point
> +
> ----------------
> I'm a bit confused here.  If we found the set of load instructions which share the generation number at the start of a block and we're merging > two blocks with a single shared successor, why do we need any further legality checks?
So, this is because i did a direct conversion of the algorithm.

The problem is that isSameOperationAs only tells you whether things
"look alike" artificially.  In particular, it *does not check whether
the operands are actually the same*.  So it's possible that these two
loads do not actually share the same pointer.

Because i did not want to mess with what loads this would hoist, i did
not try to fix this and do something like include the pointer operands
in the hash.  If you do that, then you lose whatever magic value
numbering mustAlias has inside of it.

> Aren't all the loads guaranteed to have the generation number at the end of the common block?
>
Yes

> Are you possibly trying to do two generations at once here?  Or something else more complex?
Nope. Just a pretty straightforward conversion.  But note: there is
another problem with the algorithm as they have it, which is that it
does not eliminate duplicate loads (this is why we use a multimap).
So in theory, there could be a million of the same load in one block,
and one load in the other and it will just hoist one of them :)

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:516
> @@ +515,3 @@
> +    MemoryAccess *Phi = MSSA->getMemoryAccess(BB);
> +    MemorySSA::AccessListType::iterator Place(Phi);
> +    ++Place;
> ----------------
> Are you guaranteed there's not another phi here?
Yes.  MemorySSA only ever has one phi, since it only ever has one variable.

We use getFirstNonPhi for regular instructions, and I can add such an
API to MemorySSA.

But in this case, i since updated memoryssa to have an API that
handles this for you.
So i replaced this code with:
  MemoryAccess *Phi = MSSA->getMemoryAccess(BB);
    MSSA->replaceMemoryAccessWithNewAccess(Phi, SNew,

MemorySSA::InsertionPlace::Beginning);

>  Not entirely sure what this is doing.  It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block.

It is just moving the insertion point to after the phi node.

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:558
> @@ -540,1 +557,3 @@
>
> +  StoreInfo.reserve(Pred0->size());
> +
> ----------------
> Given this is a linear traversal of Pred0, does the reserve really gain you anything here?

No, i was mainly trying to avoid resizing/rehashing.

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:568
> @@ +567,3 @@
> +  auto *TAccesses = MSSA->getBlockAccesses(T);
> +  if (!TAccesses || !isa<MemoryPhi>(&TAccesses->front()))
> +    return false;
> ----------------
> Neat!
>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:579
> @@ -550,3 +578,3 @@
>
> -    // Sink move non-simple (atomic, volatile) stores
> -    if (!isa<StoreInst>(I))
> +  for (auto AI = Pred0Accesses->rbegin(), AE = Pred0Accesses->rend(); AI != AE;
> +       ++AI) {
> ----------------
> Ok, I'm again a bit confused by the algorithm.  :)
>
> What I expected to see was something along the following:
> - Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi)
> - Merge any loads in those basic blocks which use the generation if possible.

So this will miss things that the current algorithm gets.

Consider:
clobber b
if (a)
{
store a
load b
}
else
{
store a
load b
}

During building, MemorySSA currently optimizes uses to the nearest
dominating clobbering access for you, done at the suggestion of 2
people.  We could make it stop doing this (it's just a flag), but the
tradeoff is most things end up walking more in practice.
Additionally, as things optimize, and you update memssa, you will
often end up with the *same result* anyway.  The only way to stop that
is to make it so you are required to use "nearest dominating
memorydef" instead of "dominating memorydef" when updating.  We do
this in the API's where we do insertion (addNewMemoryUse), but we
don't check/verify it in replaceMemoryAccess style API's, we only
check standard domination.

Again, code-wise trivial, it depends on what we want out of life :)

In any case, either through memoryssa building, or optimization, you will get:

1 = MemoryDef(liveonentry)
if (a) {
  2 = MemoryDef(1)
  store a
  MemoryUse (1)
  load b
} else {
  3 = MemoryDef(1)
  store a
  MemoryUse (1)
  load b
}
4=MemoryPhi(2, 3)

As you can see, the only way to get to the loads in each block is
through per-block accesses.  You won't get to them from the stores
because the stores don't conflict with them.

> - Search downward from each store to find a conflicting memory access or end of block.  (This could either by by walking instructions or mem-use edges.)
> - If both reach end of block, merge.
> - Repeat previous steps with newly created MemoryPhi for earlier stores if any.
>

Yes, this would be the ideal algorithm (and was uh, one of my
complaints about the original pass), with one variant: You may also
want to sink loads downwards to merge stores (maybe, maybe not).
As I said, i tried to make this as close to the existing algorithm as
possible to avoid any issues of , and had plans to rewrite it
completely later once we had a better handle on the API's we really
need.

Really, this algorithm is trying to calculate upwards-exposed loads
and downward-exposed stores.  We have an upwards-exposed API in the
walkers, but there is no downwards-exposed API

In this specific algorithm, what they really want is "is load upwards
exposed to block X" and "is store downwards exposed to block X".
The current clobbering calculation is overkill for this.

So if that turns out to be a common usecase,  we should just provide *that* API.

Essentially: Without more downwards users, i didn't feel comfortable
trying to design an API, so i didn't rewrite this entirely.

I am more than happy to commit to doing that work, and rewriting this,
as i convert the other sinking passes.

> http://reviews.llvm.org/D8688
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>
>