<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 12/6/21 6:11 AM, Sjoerd Meijer

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        Hi Philip,</div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        Many thanks for writing this up! I have been wanting to kick off

        this discussion for some time, but didn't get round to it. I am

        in the "profit driven LICM" camp because I see LICM as a

        canonical form leading to some poor results in benchmarks. </div>

    </blockquote>

    <p>If you could share specific public examples, that would really

      help.  We need to corpus of concrete examples where the current

      LICM strategy produces clearly sub-optimal code, and there's not

      another easy strategy available.  To date, we really have not had

      this.</p>

    <p><br>

    </p>

    <p>I'll be frank and say there's also a serious credibility

      problem.  I've seen variants of "but we have benchmarks!" multiple

      times over the years, and each time something gets publicly shared

      it turns out to be nearly trivial to handle in the existing

      scheme.  Anyone who wants to drive a conversation about changing

      our model has a hard road to walk just convincing me that such

      examples actually exist in enough volume to be even worth

      discussing.  </p>

    <p><br>

    </p>

    <p>And to be really explicit, benchmark numbers don't even begin to

      make this argument.  They might be a good corpus to mine for

      ideas, but it' the examples that matter, not the benchmark

      scores.  <br>

    </p>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">The problem as I see is

        that LICM is a canonical form, but we don't have the mechanisms

        to undo this (in the backend). I.e., LoopSink keeps being

        mentioned but this only works when profile data is available:</div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

            // Enable LoopSink only when runtime profile is available.<br>

            // With static profile, the sinking decision may be

        sub-optimal.</div>

    </blockquote>

    <p>I agree the current code has this restriction.  It's silly.  If

      the static profile is that wrong, we should fix it.  End of

      story.  The decision to restrict this code to debug-info based

      profile should have never passed review initially.  <br>

    </p>

    <p><br>

    </p>

    <p>The LoopSink code also has a number of other silly restrictions

      (not allowing out of loop uses, only visiting preheader, etc..),

      all of which should be fixed.</p>

    <p><br>

    </p>

    <p>None of these restrictions provide a strong justification for a

      profit driven LICM in my view.  Poor implementation should be

      fixed, not worked around.  <br>

    </p>

    <p><br>

    </p>

    <p>I'll also mention that IMHO fixing these issues is several orders

      of magnitude less investment that figuring out the modeling I

      described in my previous email.  <br>

    </p>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        The other candidate that could do this is MachineSink, but it

        can't sink back into loops. The result is that we have a

        canonical transform that is as aggressive as it can be, doesn't

        make a profitability call, and we can't undo this later and this

        obviously leads to suboptimal results for some cases.<br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        Here be dragons, I think. Adding profitability analysis to LICM

        is going to be tricky on IR (e.g. register pressure), but what I

        want to be explicit about is that reversing LICM in the back-end

        and on the MIR (MachineSink) is also very tricky. For example,

        alias analysis and just in general moving instructions around is

        more tricky. </div>

    </blockquote>

    <br>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">I can't back this up with

        numbers, but letting LICM serve a purpose such as enabling SCEV

        or loop idiom recognition and let it hoist profit driven seems

        to make intuitively more sense than it being a canonical form

        that (at the moment) we can't undo. And please note that we also

        have MachineCSE, which performs hoisting on MIR.

        <br>

      </div>

    </blockquote>

    I think we need to untangle the "in the backend" from "on MIR"

    here.  We have precedent for doing target specific shaping/lowering

    in IR before conversion to MIR.  LoopSink does this today, and

    building on that seems pretty reasonable.  <br>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        I completely agree with your "How could we change this?" section

        and I am happy with your mail/proposal, that we can discuss

        different approaches and not just get the "it's a canonical

        transform" answer. I will accept that the burden is on the

        person proposing a change and this being a massive project IMHO

        has prevented me so far from kicking off this discussion

        earlier.

        <br>

      </div>

    </blockquote>

    I want to be pedantic here and point out I am not making a

    proposal.  While I personally think we could plausibly make a profit

    driven LICM work, I have zero interest in trying to make that

    happen.  I am utterly unconvinced of the need to do so (see lack of

    public examples above.)  My email was about describing the path

    someone else might take, not signing up for it myself.  :)<br>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        That's why I would like to discuss here how we could best

        facilitate this discussion, and what I mean by that is

        developing this "proof" upstream. If we allow LICM to be profile

        driven under options that are off-by-default, then this proof

        could developed incrementally and upstream, in the open, which

        is by far a far more attractive development model than doing

        this first all downstream and then trying to convince the

        community. The obvious benefits are that the design can be

        discussed, others can contribute and test it, etc. The

        disadvantage obviously is adding code that is not enabled by

        default, but given that this serves the goal of an experiment

        and redesign that seems reasonable to me.<br>

      </div>

    </blockquote>

    <p>I think this is a case where the public development should be

      done in a fork.  I'm generally open to the idea of off-by-default

      experimental code in tree, but in this case, the history is such

      that I'm not willing to assume the work to motivate a change (or

      decide we're not changing) would ever happen.  I do not want such

      a flag in tree long term, and the lack of follow through from a

      year ago makes me very skeptical.  <br>

    </p>

    <p><br>

    </p>

    <p>I will note that my objection to an off-by-default flag which is

      carefully documented to be purely for experimentation (i.e.

      utterly unstable, utterly not precedent setting) is much less

      strong than my objection to having said code enabled by default. 

      I still don't want it, but its a much easier to overcome

      objection.  Make sense?<br>

    </p>

    <blockquote type="cite"

cite="mid:AM8PR08MB6595B937406EF236CF81F09EFC6D9@AM8PR08MB6595.eurprd08.prod.outlook.com">

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        Kind regards,<br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        Sjoerd.<br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

        <br>

      </div>

      <hr style="display:inline-block;width:98%" tabindex="-1">

      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"

          face="Calibri, sans-serif" color="#000000"><b>From:</b>

          llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev-bounces@lists.llvm.org"><llvm-dev-bounces@lists.llvm.org></a> on behalf of

          Michael Kruse via llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>

          <b>Sent:</b> 04 December 2021 21:54<br>

          <b>To:</b> Philip Reames <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a><br>

          <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

          <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>

          <b>Subject:</b> Re: [llvm-dev] LICM as canonical form</font>

        <div> </div>

      </div>

      <div class="BodyFragment"><font size="2"><span

            style="font-size:11pt;">

            <div class="PlainText">I am in support of your proposal.

              Checking whether an llvm::Value is<br>

              not defined by an instruction in the loop is the most

              straightforward<br>

              way to determine whether a value is loop-invariant. Cf

              D87551 the<br>

              profile information should probably be used in LoopSink or

              similar<br>

              pass determining whether the only use of an instruction is

              so rare<br>

              that it is worth moving into the conditional execution in

              the loop,<br>

              with the added benefit that it would also sink

              instructions that were<br>

              not in the loop in the the first place.<br>

              <br>

              Michael<br>

              <br>

              Am Fr., 3. Dez. 2021 um 17:28 Uhr schrieb Philip Reames

              via llvm-dev<br>

              <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a>:<br>

              ><br>

              > Later today, I'm going to be reverting D87551.  I

              first raised serious concern on said review back in Oct,

              but this is a bit of an unusual case because the change

              landed roughly a year before that.<br>

              ><br>

              > This patch introduced a profile driven heuristic to

              selectively disable hoisting of instructions out of

              loops.  By doing so, it changes a long standing design

              element without broad consensus following discussion on

              llvm-dev.  However, this email isn't really about the

              revert per se.<br>

              ><br>

              > In the course of the discussion leading to this

              point, I realized we didn't really have a cite-able

              resource describing the historical design.  This email is

              an attempt to provide that, and to highlight some of the

              issues which need addressed if we do decide we want to

              change it.<br>

              ><br>

              > LICM as canonical form<br>

              ><br>

              > We have for many years treated hoisting instructions

              out of loops as a canonical form.  That is, hoisting is

              not done because it is profitable (though it often is),

              but is instead done so that other parts of the optimizer

              can rely on it.<br>

              ><br>

              > We assume that an unprofitable hoist will be undone. 

              Historically, we have generally assumed this to be done in

              the backend, but more recently, LoopSink has also been

              added towards the end of the IR pipeline with the same

              goal.<br>

              ><br>

              > Why does this matter?<br>

              ><br>

              > Other transforms depend on us having hoisted

              instructions out of loops for effectiveness.  The largest

              source of such assumptions is that SCEV is unable to

              compute trip counts for any exit condition involving a

              loop varying load.  Almost all of our loop transformations

              depend on SCEVs trip count logic, so failing to hoist an

              otherwise hoistable load is a severe pessimization.<br>

              ><br>

              > For illustration purposes, consider this toy example:<br>

              ><br>

              >   for (int i = 0; ; i++) {<br>

              >     sum += a[i] + *b;<br>

              >     length = a.length;<br>

              >     if (i >= length) break;<br>

              >   }<br>

              ><br>

              > This example involves a typical for-loop for which

              the exit test depends on a loop varying load.<br>

              ><br>

              > Here's a couple examples:<br>

              ><br>

              > In unrolling, the form above is not unrollable.  The

              trip count is unknowable.  We might be able to use profile

              information to do a bounded full unroll if this loop is

              short running, but all other forms of unrolling (exact

              full, partial, and runtime) will be impossible.<br>

              > In the vectorizer, we will be unable to establish a

              trip count, and thus will not vectorize.  Additionally,

              even if we can compute a trip count, the cost model

              handling for uniformity depends on hoisting.<br>

              ><br>

              > Other impacts worth noting<br>

              ><br>

              > In loop idiom recognize, we will fail to recognize

              most counted idioms (e.g. popcount, cttz).  Additionally,

              things like memset recognition will not happen if the

              value being stored was hoistable, but not hoisted.<br>

              > Our ability to analyze dominating conditions (e.g.

              cvp, valuetracking, SCEV's isKnownPredicateAt) will all be

              crippled by the inability to recognize values are loop

              invariant.  When the RHS of a comparison is a potentially

              different value every time it runs, it really limits our

              ability to derive useful knowledge from that comparison or

              cross correlate comparisons.<br>

              ><br>

              > But what about an unprofitable hoist?<br>

              ><br>

              > There are examples where hoisting is not profitable. 

              Here's one such example:<br>

              ><br>

              >   for (int i = 0; i < N; i++) {<br>

              >     if (dynamically_always_taken) continue;<br>

              >     sum += a[i];<br>

              >     length = a.length;<br>

              >     if (i >= length) break;<br>

              >   }<br>

              ><br>

              > Our general posture has been that we will perform

              hoisting in the middle end, and then undo that hoisting if

              needed later in the pass pipeline.  The basic reason for

              that is that it is nearly impossible to distinguish

              profitable from unprofitable cases because the

              profitability of the transform depends too heavily on

              which following transforms might run.<br>

              ><br>

              > Here's a small example which might at first seem

              unprofitable - inspired by the patch being reverted - but

              where hoisting is in fact the far more profitable outcome.<br>

              ><br>

              >   for (int i = 0; i < N; i++) {<br>

              >     i8* addr = a;<br>

              >     if (invariant_cond_usually_false) {<br>

              >        // very, very rare block e.g. 1 in 100 million<br>

              >        addr = a + 1;<br>

              >     }<br>

              >     *addr = 0<br>

              >   }<br>

              ><br>

              > Subtly, this example should be profitable to hoist

              even if the rare condition is not invariant.  We still

              know this loop writes to at most two memory locations. 

              While we might not exploit that fact today, an extended

              form of load-store promotion could do so.  If we don't

              hoist the addressing expression under the rare branch, we

              can not (in general) determine that at most two locations

              are written.<br>

              ><br>

              > I will note that LoopSink appears to be a bit

              restricted in practice.  Someone with unprofitable

              examples could reasonably push this much further.<br>

              ><br>

              > How would we change this?<br>

              ><br>

              > I want to be very explicit about saying this design

              is only one reasonable design.  It would also be entirely

              reasonable to build a design around a profit driven LICM. 

              That's simply not what we have today.  The remainder of

              this section is about expanding on the work which would

              need to be undertaken to make such a change.<br>

              ><br>

              > First, we would need a clear set of examples where

              LICM was truly unprofitable.  These examples would need to

              be publicly accessible.  They would also need to be fairly

              minimal.  In particular, there must not be other obvious

              optimizations which if implemented makes the hoisting

              profitable after all.<br>

              ><br>

              > Second, we would need a proposal to llvm-dev which

              directly engages with the fact that SCEV (and thus most of

              our loop passes) depends on having loads hoisted for

              analysis quality.  We could build a mechanism in SCEV to

              model possible load hoisting.  There are some tricky bits

              in doing so, but it should be possible in theory.<br>

              ><br>

              > The main problem with modeling possible hoists in

              SCEV is the need to query both memory analysis and fault

              legality efficiently.  Figuring out how to make that

              available for all users of SCEV without introducing nasty

              invalidation bugs or degenerate compile times is a hard

              design problem, but might be feasible.  (I think that

              MemorySSA gives an interesting building block here, but

              have not deeply considered this.)<br>

              ><br>

              > There's also an API design problem in making sure

              that analysis results can't be consumed without committing

              to the hoisting in the IR.  A transform which e.g. assumes

              a trip count without hoisting the relevant load would be

              subtly incorrect.<br>

              ><br>

              > Third, a consensus must be built that the resulting

              additional complexity for the mechanism build to address

              the previous point is worthwhile for the project as a

              whole.  This will be a judgement call and would depend

              heavily on the solution chosen for the previous point.<br>

              ><br>

              > Finally, to be explicit, I am using the SCEV use case

              by way of an example only.  There may be other ways that

              we depend on hoisting as a canonical form that I did not

              happen to think of when writing this email.  The burden is

              on the person or person proposing a change to identify any

              other dependencies, and to convince the community they

              have done so.  Discussion of a testing strategy to find

              those dependencies should be a first class concern in any

              proposal.<br>

              ><br>

              > Philip<br>

              ><br>

              ><br>

              ><br>

              ><br>

              ><br>

              > _______________________________________________<br>

              > LLVM Developers mailing list<br>

              > <a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

              > <a

                href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

              _______________________________________________<br>

              LLVM Developers mailing list<br>

              <a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

              <a

                href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

            </div>

          </span></font></div>

    </blockquote>

  </body>

</html>