[PATCH] D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion

Tue Nov 30 13:19:25 PST 2021

lebedev.ri added a comment.

In D113520#3162304 <https://reviews.llvm.org/D113520#3162304>, @reames wrote:

> Spent a decent amount of time looking at this, but haven't yet fully formed a recommendation.  I'm going to summarize my findings to date in the hopes this is useful.

Thank you for taking a look!

> I think the basic idea here is valuable, but as implemented, I don't think this is a good idea.  The current implementation can result in a serious pessimization in cases like the following:
>
>   a = alloca <4 x i8> ... // assume a is otherwise splittable
>   <init a to zero>
>   loop {
>      if (very_rare) a[3] = 5;
>      foo(a);
>   }
>
> This transformation will have the effect of rewriting this as:
>
>   a = alloca <4 x i8> // all other uses of a split and ssa constructed
>   loop {
>      *a =  very_rate ? <0,0,0,5> : <0,0,0,0> 
>      foo(a);
>   }
>
> The result is that we've made the store to a dramatically more expensive by moving it into the hotpath.
>
> I think this is a fatal flaw with the approach.  We could maybe patch around it with some profiling data, but in general, we don't want to be moving init logic.

Indeed: https://godbolt.org/z/n64ozqnsT

> However, this is where I get struck.  I don't really have a suggestion on how to implement this better.

> Conceptually, what we want is something along the lines of D113289 <https://reviews.llvm.org/D113289>, but applied to SROA.

Right. Not sure how much relevant potential is left after that change, to be noted.

> That is, doing SSA formation to eliminate loads while leaving the stores in place for use by the calls which reads the allocas.  However, there are two problems with that:

> 1. If we want to support non-readonly calls, we'd need to materialize a load after the call.  This has the same "move to hot" problem as above.

True.

> Maybe we can just restrict it to readonly arguments?

> 2. This really doesn't cleanly fit into the code structure of SROA.  While SROA indirectly does SSA construction, it does so by forming another alloca, and then mem2regging that.  We'd have to figure out something analogous to mem2reg which didn't remove the stores or the allocas.

Well, this one seems straight-forward - instead of spilling before the call, just go through all the slices,
and duplicate each store to store into our new alloca, this will essentially preserve all the original stores.

> Purely in terms of the code structure of the current patch, the integration of this is weird.  The remat transform should be done over the AllocaSlices array after initially scanned, and should directly rebuild that data structure rather than relying on the code to handle duplicates in the array and double scanning.  That doesn't address the algorithmic concern above though, so this ends up being mostly an aside.

Note that i only scan the newly-added uses, i do not rescan everything, so i don't believe this particular concern is significant.

> This might be helpful to brainstorm offline.  Let me know if you'd like to chat.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113520/new/

https://reviews.llvm.org/D113520