[PATCH] This patch introduces MemorySSA, a virtual SSA form for memory.Details on what it looks like are in MemorySSA.h

Mon Apr 13 16:06:08 PDT 2015

On 04/13/2015 02:44 PM, Daniel Berlin wrote:
> On Mon, Apr 13, 2015 at 12:08 PM, Philip Reames
> <listmail at philipreames.com> wrote:
>> Ok, here's the MemorySSA.cpp parts of the review.  Lots of small style comments, a couple of larger concerns.  At least some of the code duplication would be nice to address before submission, though I wouldn't say it's mandatory as long as it was done quickly thereafter.
>>
>> I'm uncomfortable with some of the bits inside the CachingMemorySSAWalker routines.  Would you mind submitting the first version of the memory
>> SSA code without them and then reviewing those separately?
> So,  I'm fine with this, but note it will break the guarantees in the
> API as it currently stands.  In particular, the memoryuse is
> disambiguated guarantee will be broken (because the builder relies on
> a cachingwalker for it).
>   Nothing in-tree relies on these guarantees, but
> 1. It makes the documentation lies :)
> 2. The new MergedLoadStoreMotion relies on this guarantee (as does NewGVN).
>
> However, i'm happy to either
> 1. Remove any code that is extra-tricky (like the volatileness code below)
> or
> 2. Replace the CachingWalker with a DoNothingWalker in the MemorySSA
> builder, and submit  it knowing the comments are currently going to be
> wrong and then shepherd the walkers through review next.
>
> Whatever folks think is best
>
> I can also just add a tests to cover 90%+ of the code in the caching
> walker if that helps?  Printing tests can be used to guarantee it's
> functionality for MemoryUse's because of the fact that the builder
> uses it for each one.
>
> So for example, even things like the incomplete phi node case, or
> cyclic phi node cases, can be tested and ensure it gives right
> answers, because if it didn't, the printed form would be wrong.
I'm fine with any of the options you laid out.  If you do choose to 
temporarily break the comments, please put a big bold comment at the top 
of the file which says that.  :)
>
>>    Given how tricky MDA has been, I'm really not comfortable with the code in the walkers as it stands today without a much more careful review
>>
>>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:49
>> @@ +48,3 @@
>> +
>> +INITIALIZE_PASS(MemorySSALazy, "memoryssalazy", "Memory SSA", true, true)
>> +
>> ----------------
>> Isn't this line redundant with the above?
> Which one?
> The above init's the printer passes
You're initializing two passes back to back.  I got confused.
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:79
>> @@ +78,3 @@
>> +/// placement algorithm. It is a linear time phi placement algorithm.
>> +void MemorySSA::determineInsertionPoint(
>> +    AccessMap &BlockAccesses, const SmallPtrSetImpl<BasicBlock *> &DefBlocks) {
>> ----------------
>> I'm skipping all the code copied from PMtR.  I'm going to assume it's correct.
>>
>> p.s. Any chance this could be expressed in a common template used by both places?
> So, i looked into this pretty heavily and had talked with chandler about it.
>
> I think it should now be possible to share determineInsertionPoint,
> with a little work.
> You can't really share anything else, they are too different in what
> they want to do.
> It would basically be a lot of abstraction and callbacks
>
> But determineInsertionPoints is the tricky algorithm in this, so i
> think if we share that, we should be in good shape.
A future change please.  :)
>
>
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:281
>> @@ +280,3 @@
>> +  PerBlockAccesses.clear();
>> +  delete LiveOnEntryDef;
>> +}
>> ----------------
>> Can we use smart pointers by chance?  The manual memory management seems unnecessary.
>>
> I'm not aware that we have any smart pointer types that understand the
> pass management interface, and are destroyed on releaseMemory.
>
> I think it would be really nice  to have pass_ptr
We do have unique_ptr::reset.  We'd only need to call it in 
releaseMemory to support the iteration you mentioned later on.  The pass 
destruction would be handled for free.  :)

Having said that, I think your argument is reasonable and it doesn't 
save that much confusion.  I'll defer to your judgement here.
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:327
>> @@ +326,3 @@
>> +      }
>> +      if (def) {
>> +        MemoryDef *MD = new MemoryDef(nullptr, &I, &B, nextID++);
>> ----------------
>> else if - once you do that, can't this just become:
>> if (def) {
>> } else if (use) {
>> } else {
>>    llvm_unreachable()
>> }
>>
> It can be simplified to if (def) ... else if (use), but you can't add
> an unreachable, because it could be NoModRef
It's also worth commenting that NoModRef is possible wherever this code 
ended up.  I'd completely missed that when reading through.
>
> In any case, i removed all of it in favor of createMemoryAccess with a
> new argument.
> :)
>
>
>
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:504
>> @@ +503,3 @@
>> +      ++AI;
>> +    return replaceMemoryAccessWithNewAccess(Replacee, Replacer, AI);
>> +  }
>> ----------------
>> This seems odd.  Shouldn't we be putting Replacer in the spot Replacee was?
> No, this API handles things like store sinking/merging, where in
> general, you have no guarantee the replacer got inserted into the CFG
> in the same block as Replacee.
>
> As long as the code is a straight line, it could have been put farther down.
>
> ie
>
> if (a)
> {
>    1 = MemoryDef(liveOnEntry)
>    store %b;
> }  else {
>    2 = MemoryDef(liveOnEntry)
>    store %b;
> }
> 3 = MemoryPhi(1, 2)
> <50 more straight line BB's)
> MemoryUse(3)
> load %b
>
> If we want to sink as far as possible,  we may want to place the store
> at the beginning of load's bb, not right after the MemoryPhi.
>
> In this case, replacer will not be in the same block as replacee.
This makes sense.  So there's a dominance/post-dominance bi-direction 
between the two, but the positioning in the use lists can be quite 
different.  Odd, but okay.  A clarifying comment/test-case/comment 
pointing to test case would help here.
>
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:560
>> @@ +559,3 @@
>> +  // may now be defined in terms of it.
>> +  bool usedByReplacee = getDefiningAccess(Replacer) == Replacee;
>> +
>> ----------------
>> This condition really doesn't feel like a "replacement".  Is it time to reconsider the interface and turn this into an assert?
> Yeah, this happens in store merging, where the phi becomes the new
> defining access for the use.
>
> I will add a "addNewAccessWithDefiningAccess" to handle this case,
> instead of replace.
>
> It is actually a lot easier to express it as a series of low level
> operations, but i was trying to make sure MemorySSA is valid at each
> step of updating.
>
> Honestly, the updater API's are much more in flux than any of the
> other ones, because they essentially have two users right now, and
> while i tried to generalize them as much as i thought would be
> necessary, i could be completely wrong :)
>
> They are also harder to test.
>
> Also, today I realized we will need a cache invalidation API pretty
> quickly for the walkers, which is going to require a bit of
> thinking/iteration.
> These API should not be required for basic use, only for preserving MemorySSA.
>
> As such, I think I will remove them from the first committed version,
> and then submit them again for review with unit tests before the new
> MergedLoadStoreMotion goes in.
SGTM
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:821
>> @@ +820,3 @@
>> +
>> +void MemorySSAPrinterPass::registerOptions() {
>> +  OptionRegistry::registerOption<bool, MemorySSAPrinterPass,
>> ----------------
>> Can't you just use cl opt for this?
> Yes, my understanding was "this was the new way to do this because it
> makes the command line options scoped to the pass" instead of global.
Ah, it might be.  I have no clue.  I haven't seen any announcements or 
doc changes.
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:942
>> @@ +941,3 @@
>> +  // Don't try to walk past an incomplete phi node during construction
>> +  if (P->getNumIncomingValues() != P->getNumPreds())
>> +    return std::make_pair(P, false);
>> ----------------
>> Er, huh?  How can this happen?  And should it?
> So, we use the caching walker during construction to optimize the uses
> on MemoryUse's.
> Depending on where the backedges are, you have no guarantee you won't
> hit a PHI node whose operands are not completely filled in yet,
> because the standard SSA rename algorithm is  top-down, and depth
> first, not breadth first  (so along one branch, you check if your
> successors have phi node, and if so, fill in your open operand.   Then
> you go to your successors, who may want to make queries involving that
> phi node, but not all operands are filled in because the renamer
> hasn't gotten to the *other* branch yet)
>
> Even if we made it breadth first, my gut tells me you can construct a
> CFG with backedges that will end up with not all operands filled in at
> some query point.
A clear comment in function description which says it has to handle 
partially constructed memssa would be very helpful.  I understand why 
you want this, but it really makes me wince.

Any chance the optimization could be done via a lazy fixup after 
construction?  Just a though.  (Not as part of this change though!)

>
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1005
>> @@ +1004,3 @@
>> +    const LoadInst *QueryLI = cast<LoadInst>(QueryInst);
>> +    // An unordered load can't be clobbered by an ordered one in any
>> +    // circumstances.
>> ----------------
>> Er, what?  This just seems flat out wrong.  Two loads on the same thread to the same location are always ordered regardless of flags.  Ordering doesn't bypass a mustalias...
> Yes, this function is completely wrong, it was supposed to be about
> volatile, not about orderedness.
> It is supposed to say "if the dependence is a volatile load , and the
> query is not a volatile load or something with other ordering
> constraints, they cannot conflict", which is what
> memorydependenceanalysis says/does/we test in various places.
>
> It is now rewritten to reflect this (see MemoryDependenceAnalysis.cpp:462)
> However, i'm going to remove it anyway, it was really a work in
> progress that slipped in.
>
>
>> Unless this is absolutely required, I would request you separate this and review it separately.
> I'm happy to leave this out for now, it is just used to get NewGVN to
> pass tests/Transforms/GVN/volatile.ll
Since I think I wrote that test case, let's talk in more detail once 
you're ready to revisit this.  :)
>
>> ================
>> Comment at: lib/Transforms/Utils/MemorySSA.cpp:1110
>> @@ +1109,3 @@
>> +  if (Q.isCall) {
>> +    for (const auto &C : Q.VisitedCalls)
>> +      doCacheInsert(C, FinalAccess, Q);
>> ----------------
>> Does this optimization actually help?  Just curious.
> Yes. Without it, in NewGVN, we actually spend a bunch of time querying
> about calls up the line from us.
>
> In particular, if we query it in bottom-up-order, without this, we
> query the same set of calls (minus one call) each time.
>
> For one testcase i have, this translates into about 100k queries (vs 1
> million queries overall).  With the cache, it takes ~1000 queries
> instead.
> :)
Cool.  Thanks!

Philip