[llvm-dev] LICM as canonical form

Mon Dec 6 09:20:16 PST 2021

On 12/6/21 6:11 AM, Sjoerd Meijer wrote:
> Hi Philip,
>
> Many thanks for writing this up! I have been wanting to kick off this 
> discussion for some time, but didn't get round to it. I am in the 
> "profit driven LICM" camp because I see LICM as a canonical form 
> leading to some poor results in benchmarks.

If you could share specific public examples, that would really help.  We 
need to corpus of concrete examples where the current LICM strategy 
produces clearly sub-optimal code, and there's not another easy strategy 
available.  To date, we really have not had this.

I'll be frank and say there's also a serious credibility problem.  I've 
seen variants of "but we have benchmarks!" multiple times over the 
years, and each time something gets publicly shared it turns out to be 
nearly trivial to handle in the existing scheme.  Anyone who wants to 
drive a conversation about changing our model has a hard road to walk 
just convincing me that such examples actually exist in enough volume to 
be even worth discussing.

And to be really explicit, benchmark numbers don't even begin to make 
this argument.  They might be a good corpus to mine for ideas, but it' 
the examples that matter, not the benchmark scores.

> The problem as I see is that LICM is a canonical form, but we don't 
> have the mechanisms to undo this (in the backend). I.e., LoopSink 
> keeps being mentioned but this only works when profile data is available:
>
>     // Enable LoopSink only when runtime profile is available.
>     // With static profile, the sinking decision may be sub-optimal.

I agree the current code has this restriction.  It's silly.  If the 
static profile is that wrong, we should fix it.  End of story.  The 
decision to restrict this code to debug-info based profile should have 
never passed review initially.

The LoopSink code also has a number of other silly restrictions (not 
allowing out of loop uses, only visiting preheader, etc..), all of which 
should be fixed.

None of these restrictions provide a strong justification for a profit 
driven LICM in my view.  Poor implementation should be fixed, not worked 
around.

I'll also mention that IMHO fixing these issues is several orders of 
magnitude less investment that figuring out the modeling I described in 
my previous email.

>
> The other candidate that could do this is MachineSink, but it can't 
> sink back into loops. The result is that we have a canonical transform 
> that is as aggressive as it can be, doesn't make a profitability call, 
> and we can't undo this later and this obviously leads to suboptimal 
> results for some cases.
>
> Here be dragons, I think. Adding profitability analysis to LICM is 
> going to be tricky on IR (e.g. register pressure), but what I want to 
> be explicit about is that reversing LICM in the back-end and on the 
> MIR (MachineSink) is also very tricky. For example, alias analysis and 
> just in general moving instructions around is more tricky.

> I can't back this up with numbers, but letting LICM serve a purpose 
> such as enabling SCEV or loop idiom recognition and let it hoist 
> profit driven seems to make intuitively more sense than it being a 
> canonical form that (at the moment) we can't undo. And please note 
> that we also have MachineCSE, which performs hoisting on MIR.
I think we need to untangle the "in the backend" from "on MIR" here.  We 
have precedent for doing target specific shaping/lowering in IR before 
conversion to MIR.  LoopSink does this today, and building on that seems 
pretty reasonable.
>
> I completely agree with your "How could we change this?" section and I 
> am happy with your mail/proposal, that we can discuss different 
> approaches and not just get the "it's a canonical transform" answer. I 
> will accept that the burden is on the person proposing a change and 
> this being a massive project IMHO has prevented me so far from kicking 
> off this discussion earlier.
I want to be pedantic here and point out I am not making a proposal.  
While I personally think we could plausibly make a profit driven LICM 
work, I have zero interest in trying to make that happen.  I am utterly 
unconvinced of the need to do so (see lack of public examples above.)  
My email was about describing the path someone else might take, not 
signing up for it myself.  :)
>
> That's why I would like to discuss here how we could best facilitate 
> this discussion, and what I mean by that is developing this "proof" 
> upstream. If we allow LICM to be profile driven under options that are 
> off-by-default, then this proof could developed incrementally and 
> upstream, in the open, which is by far a far more attractive 
> development model than doing this first all downstream and then trying 
> to convince the community. The obvious benefits are that the design 
> can be discussed, others can contribute and test it, etc. The 
> disadvantage obviously is adding code that is not enabled by default, 
> but given that this serves the goal of an experiment and redesign that 
> seems reasonable to me.

I think this is a case where the public development should be done in a 
fork.  I'm generally open to the idea of off-by-default experimental 
code in tree, but in this case, the history is such that I'm not willing 
to assume the work to motivate a change (or decide we're not changing) 
would ever happen.  I do not want such a flag in tree long term, and the 
lack of follow through from a year ago makes me very skeptical.

I will note that my objection to an off-by-default flag which is 
carefully documented to be purely for experimentation (i.e. utterly 
unstable, utterly not precedent setting) is much less strong than my 
objection to having said code enabled by default. I still don't want it, 
but its a much easier to overcome objection.  Make sense?

>
> Kind regards,
> Sjoerd.
>
>
> ------------------------------------------------------------------------
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 
> Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* 04 December 2021 21:54
> *To:* Philip Reames <listmail at philipreames.com>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] LICM as canonical form
> I am in support of your proposal. Checking whether an llvm::Value is
> not defined by an instruction in the loop is the most straightforward
> way to determine whether a value is loop-invariant. Cf D87551 the
> profile information should probably be used in LoopSink or similar
> pass determining whether the only use of an instruction is so rare
> that it is worth moving into the conditional execution in the loop,
> with the added benefit that it would also sink instructions that were
> not in the loop in the the first place.
>
> Michael
>
> Am Fr., 3. Dez. 2021 um 17:28 Uhr schrieb Philip Reames via llvm-dev
> <llvm-dev at lists.llvm.org>:
> >
> > Later today, I'm going to be reverting D87551.  I first raised 
> serious concern on said review back in Oct, but this is a bit of an 
> unusual case because the change landed roughly a year before that.
> >
> > This patch introduced a profile driven heuristic to selectively 
> disable hoisting of instructions out of loops.  By doing so, it 
> changes a long standing design element without broad consensus 
> following discussion on llvm-dev.  However, this email isn't really 
> about the revert per se.
> >
> > In the course of the discussion leading to this point, I realized we 
> didn't really have a cite-able resource describing the historical 
> design.  This email is an attempt to provide that, and to highlight 
> some of the issues which need addressed if we do decide we want to 
> change it.
> >
> > LICM as canonical form
> >
> > We have for many years treated hoisting instructions out of loops as 
> a canonical form.  That is, hoisting is not done because it is 
> profitable (though it often is), but is instead done so that other 
> parts of the optimizer can rely on it.
> >
> > We assume that an unprofitable hoist will be undone. Historically, 
> we have generally assumed this to be done in the backend, but more 
> recently, LoopSink has also been added towards the end of the IR 
> pipeline with the same goal.
> >
> > Why does this matter?
> >
> > Other transforms depend on us having hoisted instructions out of 
> loops for effectiveness.  The largest source of such assumptions is 
> that SCEV is unable to compute trip counts for any exit condition 
> involving a loop varying load.  Almost all of our loop transformations 
> depend on SCEVs trip count logic, so failing to hoist an otherwise 
> hoistable load is a severe pessimization.
> >
> > For illustration purposes, consider this toy example:
> >
> >   for (int i = 0; ; i++) {
> >     sum += a[i] + *b;
> >     length = a.length;
> >     if (i >= length) break;
> >   }
> >
> > This example involves a typical for-loop for which the exit test 
> depends on a loop varying load.
> >
> > Here's a couple examples:
> >
> > In unrolling, the form above is not unrollable.  The trip count is 
> unknowable.  We might be able to use profile information to do a 
> bounded full unroll if this loop is short running, but all other forms 
> of unrolling (exact full, partial, and runtime) will be impossible.
> > In the vectorizer, we will be unable to establish a trip count, and 
> thus will not vectorize.  Additionally, even if we can compute a trip 
> count, the cost model handling for uniformity depends on hoisting.
> >
> > Other impacts worth noting
> >
> > In loop idiom recognize, we will fail to recognize most counted 
> idioms (e.g. popcount, cttz).  Additionally, things like memset 
> recognition will not happen if the value being stored was hoistable, 
> but not hoisted.
> > Our ability to analyze dominating conditions (e.g. cvp, 
> valuetracking, SCEV's isKnownPredicateAt) will all be crippled by the 
> inability to recognize values are loop invariant.  When the RHS of a 
> comparison is a potentially different value every time it runs, it 
> really limits our ability to derive useful knowledge from that 
> comparison or cross correlate comparisons.
> >
> > But what about an unprofitable hoist?
> >
> > There are examples where hoisting is not profitable. Here's one such 
> example:
> >
> >   for (int i = 0; i < N; i++) {
> >     if (dynamically_always_taken) continue;
> >     sum += a[i];
> >     length = a.length;
> >     if (i >= length) break;
> >   }
> >
> > Our general posture has been that we will perform hoisting in the 
> middle end, and then undo that hoisting if needed later in the pass 
> pipeline.  The basic reason for that is that it is nearly impossible 
> to distinguish profitable from unprofitable cases because the 
> profitability of the transform depends too heavily on which following 
> transforms might run.
> >
> > Here's a small example which might at first seem unprofitable - 
> inspired by the patch being reverted - but where hoisting is in fact 
> the far more profitable outcome.
> >
> >   for (int i = 0; i < N; i++) {
> >     i8* addr = a;
> >     if (invariant_cond_usually_false) {
> >        // very, very rare block e.g. 1 in 100 million
> >        addr = a + 1;
> >     }
> >     *addr = 0
> >   }
> >
> > Subtly, this example should be profitable to hoist even if the rare 
> condition is not invariant.  We still know this loop writes to at most 
> two memory locations. While we might not exploit that fact today, an 
> extended form of load-store promotion could do so.  If we don't hoist 
> the addressing expression under the rare branch, we can not (in 
> general) determine that at most two locations are written.
> >
> > I will note that LoopSink appears to be a bit restricted in 
> practice.  Someone with unprofitable examples could reasonably push 
> this much further.
> >
> > How would we change this?
> >
> > I want to be very explicit about saying this design is only one 
> reasonable design.  It would also be entirely reasonable to build a 
> design around a profit driven LICM. That's simply not what we have 
> today.  The remainder of this section is about expanding on the work 
> which would need to be undertaken to make such a change.
> >
> > First, we would need a clear set of examples where LICM was truly 
> unprofitable.  These examples would need to be publicly accessible.  
> They would also need to be fairly minimal.  In particular, there must 
> not be other obvious optimizations which if implemented makes the 
> hoisting profitable after all.
> >
> > Second, we would need a proposal to llvm-dev which directly engages 
> with the fact that SCEV (and thus most of our loop passes) depends on 
> having loads hoisted for analysis quality.  We could build a mechanism 
> in SCEV to model possible load hoisting.  There are some tricky bits 
> in doing so, but it should be possible in theory.
> >
> > The main problem with modeling possible hoists in SCEV is the need 
> to query both memory analysis and fault legality efficiently.  
> Figuring out how to make that available for all users of SCEV without 
> introducing nasty invalidation bugs or degenerate compile times is a 
> hard design problem, but might be feasible.  (I think that MemorySSA 
> gives an interesting building block here, but have not deeply 
> considered this.)
> >
> > There's also an API design problem in making sure that analysis 
> results can't be consumed without committing to the hoisting in the 
> IR.  A transform which e.g. assumes a trip count without hoisting the 
> relevant load would be subtly incorrect.
> >
> > Third, a consensus must be built that the resulting additional 
> complexity for the mechanism build to address the previous point is 
> worthwhile for the project as a whole.  This will be a judgement call 
> and would depend heavily on the solution chosen for the previous point.
> >
> > Finally, to be explicit, I am using the SCEV use case by way of an 
> example only.  There may be other ways that we depend on hoisting as a 
> canonical form that I did not happen to think of when writing this 
> email.  The burden is on the person or person proposing a change to 
> identify any other dependencies, and to convince the community they 
> have done so.  Discussion of a testing strategy to find those 
> dependencies should be a first class concern in any proposal.
> >
> > Philip
> >
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 
> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 
> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211206/7fa475bb/attachment.html>