[PATCH] D53203: Allow MemoryLocation to carry pre-existing knowledge to AA to elide expensive repeated checks

Thu Nov 1 17:25:42 PDT 2018

dsanders added a comment.

In https://reviews.llvm.org/D53203#1283422, @hfinkel wrote:

> In https://reviews.llvm.org/D53203#1282454, @dsanders wrote:
>
> > Hi Hal,
> >
> > I didn't spot you in CODE_OWNERS.txt at first because I was searching for 'Alias' instead of 'alias' :-). Would you be able to take a look at this patch too?
>
>
> Thanks. I suppose that I should echo Chris's question about the proposed local-dominance cache with respect to this change too: If we switch this pass to using MemorySSA does this problem go away?

Do you mean the question at http://lists.llvm.org/pipermail/llvm-dev/2018-September/126375.html?

I'm not very familiar with the MemorySSA pass but based on a fairly quick skim of it ...
It looks like the MemoryDef/MemoryUse objects are attached to the BB rather than the instructions. That wouldn't give us the accuracy we need for DSE to eliminate the store in something like:

  %p = alloca ...
  %1 = load %p
  store %p, %1
  unreachable

but it would give a reasonable early test as to whether it's worth asking further questions about the instructions. If the block containing the call site lacks a `MemoryUse` for a given alloca's MemoryDef then we could avoid asking whether the the call site accesses the given alloca because we know that nothing in the BB accesses it. However, this is only helpful if MemorySSA tries to eliminate `MemoryUse`s for allocas that don't escape or where the called function is inspected and confirmed to not use that particular alloca. If it's conservative (e.g. adds a MemoryUse for all allocas at every call site) then we can't cull the expensive 'is it a non-escaping local?' check because MemorySSA only tells us that something in the BB used it, rather than a particular instruction used it.

Looking at MemorySSA's code, I'm wondering if it may be hitting the same performance issue that this patch is targeting. I see that getModRefInfo() is called quite a bit, most notably for every instruction in buildMemorySSA() (via createNewAccess()). Every one of those calls potentially calls the expensive PointerMayBeCaptured() (subject to early exits) and doesn't cache the results of any of them even though it's a property of the MemoryLocation rather than the Instruction. It's also potentially called again inside instructionClobbersQuery() which also looks like it's called fairly often (in walkToPhiOrClobber() and optimizeUsesInBlock()).

In https://reviews.llvm.org/D53203#1284533, @asbirlea wrote:

> Hal,
>
> I have mixed feelings here because I understand Chris's point, but there are differences in this case which make this patch worth pushing forward IMO.
>
> MemorySSA in the current format will not solve this. It can be (and should be) extended to handle cases with *some* similarity (because we have other use cases for it). Trying to explain below.
>
> I'm not that familiar with DSE but if I understand correctly, the sequence is:
>
> - Keep a list of instructions, known to be allocas (or alike?)
> - Call for all callsites *after* those allocas getModRefInfo
> - If CS can read from any of those allocas, all stores above the CS are live (and it's iterating BBs in reverse so marking all preceding stores live as soon as one call is found to Ref)

That's right. The particular case that was causing performance problems for me was:

  call void @__assert_rtn(i8* %0, i8* %1, i8* %2)
  unreachable

I had 5,000 exit blocks like that and 13,000 allocas to consider. Every alloca was potentially dead at the unreachable and it called getModRefInfo() for each one to find out if __assert_rtn was able to access it. The answer was always no because the allocas were all non-escaping locals but finding that out was expensive and it re-checked for every local alloca and every call site.

> So the patch adds info about the list of allocas inside the MemoryLocation they modify and saves some of the work getModRefInfo does.
> 
> MemorySSA cannot currently provide info about isClobberedBy, it provides uses which are not yet optimized (this is a useful and expensive extension, and we should have a walker for it)
>  But even if we had this in MemorySSA now, the info cached in this case does not necessarily overlap with storing "isLocal" and "isNonEscaping" for a MemoryLocation, plus it is more expensive.
>  We're looking here for ModRef between two particular instances  that may have many other memory-accessing instructions inbetween, with whom they MayAlias. A getModRefInfo call should be cheaper than a MemorySSA query here.
>  Perhaps I could argue that rewriting checks entirely (to avoid this particular ModRef call) would make MemorySSA more viable, but that's a whole other discussion and it would involve re-writing the pass.
> 
> The cost is clearly the extra memory added to MemoryLocation (Chris's objection for BBs) which is a core data structure.

We can potentially eliminate that cost in MemoryLocation using the separate non-const MemoryLocationKnowledge object we were talking about as an input and output. Callers that don't provide it would still need to allocate memory for it but it would be a default-constructed object in the callees stack frame.

> Unlike BBs we create MemoryLocations often on the spot and drop them right away.
>  This may make things better (as far as memory cost throughout the compilation) or worse (as far as allocations).
>  IMO, as far as caching invalidation, it is clearly better than BBs, since MemoryLocations are not passed between Analyses or Transforms. Hence the KnownFlags set by a pass will be used only in that pass, and whatever caching the pass does is dropped for others. 
>  It could be argued that this could be another extension to add to MemorySSA so we do pass cached info forward, but I think that's beyond the scope of MemorySSA right now.
> 
> Hope this makes sense, and please feel free to pull in Chris and the others to give feedback.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:774
+        // lots of allocas and lots of noreturn functions (e.g. assert)
+        KnownFlags |= PointerMayBeCaptured(&I, false, true)
+                          ? MLK_KnownMayEscape
----------------
asbirlea wrote:
> dsanders wrote:
> > asbirlea wrote:
> > > Can we avoid this call and leave the Escaping bits both not set?
> > > Precisely because PointerMayBeCaptured is expensive. Ideally this should be computed only on demand (in geModRefInfo) and the info passed back when caching.
> > We can, although for DSE at least we won't end up saving any further calls to it as a result. DSE calls getModRef() for every local (or local equivalent) in DeadStackObjects so we'll always call PointerMayBeCaptured() once for each item either way
> Got it. 
> I'm wondering what's the cause of the regressions you're seeing? Just from adding to the cache and clearing it?
I think so but I haven't dug into it yet to confirm it

Repository:
  rL LLVM

https://reviews.llvm.org/D53203