[PATCH] This patch introduces MemorySSA, a virtual SSA form for memory.Details on what it looks like are in MemorySSA.h

Mon Apr 13 14:44:33 PDT 2015

On Mon, Apr 13, 2015 at 12:08 PM, Philip Reames
<listmail at philipreames.com> wrote:
> Ok, here's the MemorySSA.cpp parts of the review.  Lots of small style comments, a couple of larger concerns.  At least some of the code duplication would be nice to address before submission, though I wouldn't say it's mandatory as long as it was done quickly thereafter.
>
> I'm uncomfortable with some of the bits inside the CachingMemorySSAWalker routines.  Would you mind submitting the first version of the memory
> SSA code without them and then reviewing those separately?

So,  I'm fine with this, but note it will break the guarantees in the
API as it currently stands.  In particular, the memoryuse is
disambiguated guarantee will be broken (because the builder relies on
a cachingwalker for it).
 Nothing in-tree relies on these guarantees, but
1. It makes the documentation lies :)
2. The new MergedLoadStoreMotion relies on this guarantee (as does NewGVN).

However, i'm happy to either
1. Remove any code that is extra-tricky (like the volatileness code below)
or
2. Replace the CachingWalker with a DoNothingWalker in the MemorySSA
builder, and submit  it knowing the comments are currently going to be
wrong and then shepherd the walkers through review next.

Whatever folks think is best

I can also just add a tests to cover 90%+ of the code in the caching
walker if that helps?  Printing tests can be used to guarantee it's
functionality for MemoryUse's because of the fact that the builder
uses it for each one.

So for example, even things like the incomplete phi node case, or
cyclic phi node cases, can be tested and ensure it gives right
answers, because if it didn't, the printed form would be wrong.

> There's some very complex logic there that duplicates parts of MDA.

Yes, agreed.
>   Given how tricky MDA has been, I'm really not comfortable with the code in the walkers as it stands today without a much more careful review
>
>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:49
> @@ +48,3 @@
> +
> +INITIALIZE_PASS(MemorySSALazy, "memoryssalazy", "Memory SSA", true, true)
> +
> ----------------
> Isn't this line redundant with the above?

Which one?
The above init's the printer passes

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:79
> @@ +78,3 @@
> +/// placement algorithm. It is a linear time phi placement algorithm.
> +void MemorySSA::determineInsertionPoint(
> +    AccessMap &BlockAccesses, const SmallPtrSetImpl<BasicBlock *> &DefBlocks) {
> ----------------
> I'm skipping all the code copied from PMtR.  I'm going to assume it's correct.
>
> p.s. Any chance this could be expressed in a common template used by both places?

So, i looked into this pretty heavily and had talked with chandler about it.

I think it should now be possible to share determineInsertionPoint,
with a little work.
You can't really share anything else, they are too different in what
they want to do.
It would basically be a lot of abstraction and callbacks

But determineInsertionPoints is the tricky algorithm in this, so i
think if we share that, we should be in good shape.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:281
> @@ +280,3 @@
> +  PerBlockAccesses.clear();
> +  delete LiveOnEntryDef;
> +}
> ----------------
> Can we use smart pointers by chance?  The manual memory management seems unnecessary.
>

I'm not aware that we have any smart pointer types that understand the
pass management interface, and are destroyed on releaseMemory.

I think it would be really nice  to have pass_ptr

> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:303
> @@ +302,3 @@
> +  // for every possible instruction in the stream.  Instead, we build
> +  // lists, and then throw it out once the use-def form is built.
> +  SmallPtrSet<BasicBlock *, 32> DefiningBlocks;
> ----------------
> We no longer throw this out do we?

Correct, fixed.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:321
> @@ +320,3 @@
> +        InstructionToMemoryAccess.insert(std::make_pair(&I, MU));
> +        if (!Accesses) {
> +          Accesses = new AccessListType();
> ----------------
> This should be pulled out in a helper lambda or otherwise commoned.

It's now in a helper.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:327
> @@ +326,3 @@
> +      }
> +      if (def) {
> +        MemoryDef *MD = new MemoryDef(nullptr, &I, &B, nextID++);
> ----------------
> else if - once you do that, can't this just become:
> if (def) {
> } else if (use) {
> } else {
>   llvm_unreachable()
> }
>

It can be simplified to if (def) ... else if (use), but you can't add
an unreachable, because it could be NoModRef

In any case, i removed all of it in favor of createMemoryAccess with a
new argument.
:)

> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:415
> @@ +414,3 @@
> +    MemoryDef *MD = new MemoryDef(nullptr, I, I->getParent(), nextID++);
> +    InstructionToMemoryAccess.insert(std::make_pair(I, MD));
> +
> ----------------
> Don't you need to add these to the per-BB list?

Yes, but that is done elsewhere because the exact placement varies,
while the lookups don't need place-specific info.
(That is why these accesses have nullptr as the first argument).
It turns out to be quite hard to common that part, because what
happens differs depending on the update.

> Also, can this code be commoned with the memory SSA construction path?

Yes. In order to enable the assert to stay there, i added an option to
make this call ignore non-memory or not

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:427
> @@ +426,3 @@
> +
> +MemoryAccess *MemorySSA::findDominatingDef(Instruction *Use,
> +                                           enum InsertionPlace Where) {
> ----------------
> It looks like you're not really using the instruction at all, maybe just pass the basicblock to begin with?

Fair point, done.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:462
> @@ +461,3 @@
> +  MemoryAccess *DefiningDef = findDominatingDef(Use, Where);
> +  if (!Result.second) {
> +    Accesses = Result.first->second;
> ----------------
> Again, can we common this somewhere?  You've got this lazy creation scatter all over the place.
>
> Accesses = ensureBBList(BB)?

Done, it's now getOrCreateAccessList
>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:504
> @@ +503,3 @@
> +      ++AI;
> +    return replaceMemoryAccessWithNewAccess(Replacee, Replacer, AI);
> +  }
> ----------------
> This seems odd.  Shouldn't we be putting Replacer in the spot Replacee was?

No, this API handles things like store sinking/merging, where in
general, you have no guarantee the replacer got inserted into the CFG
in the same block as Replacee.

As long as the code is a straight line, it could have been put farther down.

ie

if (a)
{
  1 = MemoryDef(liveOnEntry)
  store %b;
}  else {
  2 = MemoryDef(liveOnEntry)
  store %b;
}
3 = MemoryPhi(1, 2)
<50 more straight line BB's)
MemoryUse(3)
load %b

If we want to sink as far as possible,  we may want to place the store
at the beginning of load's bb, not right after the MemoryPhi.

In this case, replacer will not be in the same block as replacee.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:547
> @@ +546,3 @@
> +  for (const auto &Arg : MP->args())
> +    if (Arg.second == Replacee)
> +      if (!DT->dominates(Replacer->getBlock(), Arg.first))
> ----------------
> Did you possible mean a != here?
>

No, i added a comment to try to make what is happening clear:
Above this loop, it now says:
  // For a phi node, the use occurs in the predecessor block of the phi node,
  // not in the phi node block.
  // Since we may occur multiple times in the phi node, we have to check each
  // operand to ensure Replacer dominates the operand where Replacee occurs

We would not need to do this if we had a uselist instead of a userlist :)

> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:560
> @@ +559,3 @@
> +  // may now be defined in terms of it.
> +  bool usedByReplacee = getDefiningAccess(Replacer) == Replacee;
> +
> ----------------
> This condition really doesn't feel like a "replacement".  Is it time to reconsider the interface and turn this into an assert?

Yeah, this happens in store merging, where the phi becomes the new
defining access for the use.

I will add a "addNewAccessWithDefiningAccess" to handle this case,
instead of replace.

It is actually a lot easier to express it as a series of low level
operations, but i was trying to make sure MemorySSA is valid at each
step of updating.

Honestly, the updater API's are much more in flux than any of the
other ones, because they essentially have two users right now, and
while i tried to generalize them as much as i thought would be
necessary, i could be completely wrong :)

They are also harder to test.

Also, today I realized we will need a cache invalidation API pretty
quickly for the walkers, which is going to require a bit of
thinking/iteration.
These API should not be required for basic use, only for preserving MemorySSA.

As such, I think I will remove them from the first committed version,
and then submit them again for review with unit tests before the new
MergedLoadStoreMotion goes in.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:571
> @@ +570,3 @@
> +    if (MemoryPhi *MP = dyn_cast<MemoryPhi>(U)) {
> +      for (unsigned i = 0, e = MP->getNumIncomingValues(); i != e; ++i) {
> +        if (MP->getIncomingValue(i) == Replacee)
> ----------------
> It might be cleaner to unconditionally insert the new value, then fold single use memory phis. Having a fold single use phi utility might also be useful on the public interface.

I will add something for this

>
> Actually, are you expecting an invariant where such phis are always folded?  If so, please update comments to say so.
>
No, we allow whatever phis we want, it just makes walking a bit faster.

> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:611
> @@ +610,3 @@
> +                              "repoint to!");
> +    assert(onlySingleValue(MP) && "This phi still points to multiple values, "
> +                                  "which means it is still needed");
> ----------------
> If there are no uses, why is it required?  Did you mean:
> use_empty || onlySingleValue?

it is user_empty || onlySingleValue

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:620
> @@ +619,3 @@
> +      if (MemoryPhi *MP = dyn_cast<MemoryPhi>(U)) {
> +        for (unsigned i = 0, e = MP->getNumIncomingValues(); i != e; ++i) {
> +          if (MP->getIncomingValue(i) == MA)
> ----------------
> You've got this code repeated a bit.  Utility function on memoryPhi?

Done

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:821
> @@ +820,3 @@
> +
> +void MemorySSAPrinterPass::registerOptions() {
> +  OptionRegistry::registerOption<bool, MemorySSAPrinterPass,
> ----------------
> Can't you just use cl opt for this?

Yes, my understanding was "this was the new way to do this because it
makes the command line options scoped to the pass" instead of global.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:852
> @@ +851,3 @@
> +
> +void MemorySSALazy::releaseMemory() { delete MSSA; }
> +
> ----------------
> smart pointer?

You can't, because the structure isn't destroyed, the pass API calls

runOnFunction
releaseMemory
runOnFunction
releaseMemory
....

it never destroys the pass object, so the smart pointer will never get
deleted until the very end.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:942
> @@ +941,3 @@
> +  // Don't try to walk past an incomplete phi node during construction
> +  if (P->getNumIncomingValues() != P->getNumPreds())
> +    return std::make_pair(P, false);
> ----------------
> Er, huh?  How can this happen?  And should it?

So, we use the caching walker during construction to optimize the uses
on MemoryUse's.
Depending on where the backedges are, you have no guarantee you won't
hit a PHI node whose operands are not completely filled in yet,
because the standard SSA rename algorithm is  top-down, and depth
first, not breadth first  (so along one branch, you check if your
successors have phi node, and if so, fill in your open operand.   Then
you go to your successors, who may want to make queries involving that
phi node, but not all operands are filled in because the renamer
hasn't gotten to the *other* branch yet)

Even if we made it breadth first, my gut tells me you can construct a
CFG with backedges that will end up with not all operands filled in at
some query point.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1005
> @@ +1004,3 @@
> +    const LoadInst *QueryLI = cast<LoadInst>(QueryInst);
> +    // An unordered load can't be clobbered by an ordered one in any
> +    // circumstances.
> ----------------
> Er, what?  This just seems flat out wrong.  Two loads on the same thread to the same location are always ordered regardless of flags.  Ordering doesn't bypass a mustalias...

Yes, this function is completely wrong, it was supposed to be about
volatile, not about orderedness.
It is supposed to say "if the dependence is a volatile load , and the
query is not a volatile load or something with other ordering
constraints, they cannot conflict", which is what
memorydependenceanalysis says/does/we test in various places.

It is now rewritten to reflect this (see MemoryDependenceAnalysis.cpp:462)
However, i'm going to remove it anyway, it was really a work in
progress that slipped in.

>
> Unless this is absolutely required, I would request you separate this and review it separately.
I'm happy to leave this out for now, it is just used to get NewGVN to
pass tests/Transforms/GVN/volatile.ll

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1025
> @@ +1024,3 @@
> +    if (MemoryPhi *P = dyn_cast<MemoryPhi>(CurrAccess))
> +      return getClobberingMemoryAccess(P, Q);
> +    else {
> ----------------
> I might be missing something here, but how is this not an infinite recursion on itself?

It calls the routine above it it, not this one :)

It's also guaranteed to be the last thing we need to call. Either we
end up stopping at that phi node, or we get past it, at which point
*that* routine is going to give us the answer to this query, because
it's mutually recursive.

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1042
> @@ +1041,3 @@
> +      if (!Q.isCall) {
> +        // Okay, well, see if it's a volatile load vs non-volatile load
> +        // situation or smoething similar)
> ----------------
> This seems suspicious to me as said above.
>
> More generally, the smarts of this routine deserve to be reviewed independently.
Agreed, particularly since David Majnemer already discovered that the
similar code for ordering in MemoryDependenceAnalysis is in fact,
buggy :)

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1083
> @@ +1082,3 @@
> +    return StartingAccess;
> +  } else if (!isa<CallInst>(I) && !isa<InvokeInst>(I)) {
> +    Q.isCall = false;
> ----------------
> Mild preference, change this to:
> else if (CallSite CS = ...) {
> } else {
> }

I actually just did that this morning :)

>
> ================
> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1110
> @@ +1109,3 @@
> +  if (Q.isCall) {
> +    for (const auto &C : Q.VisitedCalls)
> +      doCacheInsert(C, FinalAccess, Q);
> ----------------
> Does this optimization actually help?  Just curious.

Yes. Without it, in NewGVN, we actually spend a bunch of time querying
about calls up the line from us.

In particular, if we query it in bottom-up-order, without this, we
query the same set of calls (minus one call) each time.

For one testcase i have, this translates into about 100k queries (vs 1
million queries overall).  With the cache, it takes ~1000 queries
instead.
:)

>
> http://reviews.llvm.org/D7864
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>
>