[PATCH] D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion

Tue Nov 30 12:18:53 PST 2021

reames requested changes to this revision.
reames added a comment.
This revision now requires changes to proceed.

Spent a decent amount of time looking at this, but haven't yet fully formed a recommendation.  I'm going to summarize my findings to date in the hopes this is useful.

I think the basic idea here is valuable, but as implemented, I don't think this is a good idea.  The current implementation can result in a serious pessimization in cases like the following:

  a = alloca <4 x i8> ... // assume a is otherwise splittable
  <init a to zero>
  loop {
     if (very_rare) a[3] = 5;
     foo(a);
  }

This transformation will have the effect of rewriting this as:

  a = alloca <4 x i8> // all other uses of a split and ssa constructed
  loop {
     *a =  very_rate ? <0,0,0,5> : <0,0,0,0> 
     foo(a);
  }

The result is that we've made the store to a dramatically more expensive by moving it into the hotpath.

I think this is a fatal flaw with the approach.  We could maybe patch around it with some profiling data, but in general, we don't want to be moving init logic.

However, this is where I get struck.  I don't really have a suggestion on how to implement this better.  Conceptually, what we want is something along the lines of D113289 <https://reviews.llvm.org/D113289>, but applied to SROA.  That is, doing SSA formation to eliminate loads while leaving the stores in place for use by the calls which reads the allocas.  However, there are two problems with that:

1. If we want to support non-readonly calls, we'd need to materialize a load after the call.  This has the same "move to hot" problem as above.  Maybe we can just restrict it to readonly arguments?
2. This really doesn't cleanly fit into the code structure of SROA.  While SROA indirectly does SSA construction, it does so by forming another alloca, and then mem2regging that.  We'd have to figure out something analogous to mem2reg which didn't remove the stores or the allocas.

Purely in terms of the code structure of the current patch, the integration of this is weird.  The remat transform should be done over the AllocaSlices array after initially scanned, and should directly rebuild that data structure rather than relying on the code to handle duplicates in the array and double scanning.  That doesn't address the algorithmic concern above though, so this ends up being mostly an aside.

This might be helpful to brainstorm offline.  Let me know if you'd like to chat.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113520/new/

https://reviews.llvm.org/D113520