[llvm] [LICM] Promote conditional, loop-invariant memory accesses to scalars with intrinsic (PR #93999)

Tue Jun 4 09:03:09 PDT 2024

preames wrote:

High level comment...

I generally think that allowing the hoisting of predicated scalar loads and stores outside the loop is a good overall idea, but there's a ton of details for which I don't have a clearly formed opinion.  (For anyone reading older reviews, note that I feel I have less clarity on this than I used to...)

One of the major concerns here is that while there are cases where hoisting a predicated store or load out of a loop is hugely profitable, there are also a bunch of cases where it isn't, and may even be strongly negative.  My original thought was that we'd treat the predicated form as a canonical form, and reverse the transform if needed, but well, I'm generally less sure that's a good idea today.

Representation wise, I would be tempted to extend our existing masked.load and masked.store intrinsic to allow non-vector types.  That seems "cleanest" to me.  When I'd glanced at this a while back, I'd tentatively wanted to represent scalars as 1 x Type vectors, but I think I convinced myself that just doesn't work out well.  

I think there's a bit of a decision point here.  If we want to restrict to only performing profitable hoist/sink, then we probably don't want an intrinsic at all, and should just have LICM rewrite the CFG.  If we think predicated scalar loads and stores are a decent canonical form (in at least some cases), then we need to plumb those through the optimizer, and codegen.  Lowering them early really feels like a half way point between the two options above, and honestly, I suspect is the worst of both worlds.  

Another thought - unrelated to the above - is that predicated loads and stores don't *have* to be conditional in their lowering.  If you have a safe memory region you can do a select of two addresses instead.  Depending on the target, this can be better or worse overall, but might better integrate with e.g. our existing select lowering.  One interesting point is that LICM is in a good spot to identify dereferenceable alternate memory locations for loads.  Stores are harder since you have to know that no one reads from it - maybe a dummy alloca?  (This idea is very much unexplored, and might turn out to be a terrible suggestion - take this as brainstorming, nothing more.)

https://github.com/llvm/llvm-project/pull/93999