[PATCH] D117926: [SLP] Optionally preserve MemorySSA

Fri Jan 21 13:50:37 PST 2022

reames added a comment.

In D117926#3262469 <https://reviews.llvm.org/D117926#3262469>, @nikic wrote:

> Can you please explain what the larger context here is? What cases you trying to solve with MemorySSA?

Sure, though I'm a bit limited in what I can say.  The original example is not public.

Essentially, I have a case where we are spending a large fraction of total O2 <https://reviews.llvm.org/owners/package/2/> time inside SLP - specifically, inside the code which is figuring out which memory dependencies exist while trying to schedule.  (To prevent confusion, note that SLP scheduling subsumes several legality tests.)

Specifically, the case which is hurting this example - which is machine generated code - is a very long basic block with a vectorizable pair of loads at the beginning, and a vectorizable pair of stores (consuming the loaded values) at the end.  There's multiple pairs, but the core detail is that the required scheduling window is basically the entire size of the huge basic block.

The time is spent figuring out dependencies for *scalar* instructions - not even the ones we're trying to vectorize.  Since this is such a huge block, the current mssa-like memory chain ends up being very expensive.

I'd explored options for limiting the scheduling window, but mssa felt like a more general answer, so I started there.

> I'm not sure it will be the right tool for the job, so I think we should discuss this before making any changes. We don't have MSSA available at SLP's pipeline position, and computing it just for SLP will make this pass much more expensive.

I'm really surprised to hear you say that.  My understanding was that memory ssa was rather cheap to construct if you don't need an optimized form, and that optimization was done lazily.

However, I see my memory of prior discussion on this topic is clearly wrong.  The constructor for memoryssa does appear to eagerly optimize.

Despite this, I don't see memoryssaanalysis showing up as expensive in the -time-passes-per-run output even with this change.  I see SLP itself slow down a lot, but I had put that down to the generic renaming instead of using specialized knowledge from the callsite.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117926/new/

https://reviews.llvm.org/D117926