[llvm-dev] Generalizing load/store promotion in LICM

Thu Sep 13 13:42:53 PDT 2018

(minor inline additions)

On 09/13/2018 01:51 AM, Chandler Carruth wrote:
> Haven't had time to dig into this, but wanted to add +Alina Sbirlea 
> <mailto:asbirlea at google.com> to the thread as she has been working on 
> promotion and other aspects of LICM for a long time here.
Thanks!
> On Wed, Sep 12, 2018 at 11:41 PM Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>     I'm thinking about making some semi radical changes to load store
>     promotion works in LICM, and figured it would be a good idea to
>     get buy in before I actually started writing code.  :)
>
>     TLDR: legality of sinking stores to exits is hard, can we separate
>     load handling into a super aggressive form of PRE, and use
>     predicated stores to avoid solving legality question?
>
>
>     Background
>
>     We've been seeing an interesting class of problems recently that
>     looks roughly like this:
>
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) // some data dependent check
>         g_count++; // conditional load and store to shared location
>
>     The critical detail in this example is that g_count is a global
>     location which may be accessed concurrently* by multiple threads. 
>     The challenge here is that we don't know whether the store ever
>     executes, and we're not allowed to insert a store along any path
>     that didn't originally contain them.  Said differently, if all
>     elements in "a" are non-zero, we're not allowed to store to
>     g_count.  We do know that the g_count location is dereferenceable
>     though.
>
>     (*Please, let's avoid the memory model derailment here. I'm
>     simplifying and C++ language rules aren't real useful for my Java
>     language frontend anyways.  In practice, all the access are
>     atomic, but unordered, but we'll leave that out of discussion
>     otherwise.)
>
>     I have two approaches I'm considering here.  These are orthogonal,
>     but I suspect we'll want to implement both.
>
>
>     Proposal A - Use predicated stores in loop exits
>
>     The basic idea is that we don't worry about solving the legality
>     question above, and just insert a store which is predicated on a
>     condition which is true exactly when the original store ran.  In
>     pseudo code, this looks something like:
>
>     bool StoreLegal = false;
>     int LocalCount = g_count;
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) {
>         LocalCount++;
>         StoreLegal = true;
>       }
>     if (StoreLegal) g_count = LocalCount;
>
>     There are two obvious concerns here:
>
>      1. The predicated store might be expensive in practice - true for
>         most current architectures.
>      2. We''re introducing an extra boolean phi cycle around the loop.
>
>     Talking to a couple of folks offline at the socials over the last
>     few months, the former seems to be the main objection.  I think we
>     can control this by restricting this transform to when the loop is
>     believed to have a high trip count and the conditional path is
>     taken with some frequency.   Getting feedback on this particular
>     point is one of my main reasons for writing this mail.
>
>     The second objection can frequently be resolved by finding other
>     expressions which evaluate to the same boolean.  (In this case, if
>     LocalCount != LocalCountOrig assuming i doesn't overflow.)  We
>     already have a framework with SCEV to do these replacements. 
>     Though, from some quick testing, it would definitely need
>     strengthening. However, SCEV can't remove the extra phi in all
>     cases, so we have to be okay with the extra phi cycle in the
>     general case.  This seems worthwhile to me, but thoughts?
>
>
>     Proposal B - Separate load and store handling into distinct transforms
>
>     (For folks I've discussed this with before, this part is all new.)
>
>     Thinking about the problem, it finally occurred to me that we can
>     decompose the original example into two steps: getting the loads
>     out of the loop, and sinking the stores out of the loop.  If we
>     can accomplish the former, but not the later, we've still made
>     meaningful progress.
>
>     So, what'd we'd essentially have is a load only transformation
>     which produces this:
>     int LocalCount = g_count;
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) {
>         LocalCount++;
>         g_count = LocalCount;
>       }
>
>     At this point, we've reduced memory traffic by half, and opened up
>     the possibility that other parts of the optimizer can exploit the
>     simplified form.  The last point is particular interesting since
>     we generally try to canonicalize loads out of loops, and much of
>     the optimizer is tuned for a form with as much as possible being
>     loop invariant.  As one example, completely by accident, there's
>     some work going on in the LoopVectorizer right now to handle such
>     stores to loop invariant addresses during vectorization.  Putting
>     the two pieces together would let us fully vectorize this loop
>     without needing to sink the stores at all.
>
>     In practice, the existing implementation in LICM would cleanly
>     split along these lines with little problem.
>
>     One way of looking at this load specific transform is as an
>     extreme form of PRE (partial redundancy elimination). Our current
>     PRE implementation doesn't handle cases this complicated.
>
It occurred to my later that simply framing the new transform as a 
separate pass (LoopPRE) and using the same AST + SSA construction 
approach would be straight forward.  So, if folks think that having an 
aggressive form of load PRE in LICM is going a bit too far, it'd be easy 
to represent as an optional separate pass.  I'd still prefer having LICM 
contain the logic though.
>
>     Thoughts?
>
>     Philip
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180913/c383000f/attachment.html>