<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Later today, I'm going to be reverting D87551.  I first raised

      serious concern on said review back in Oct, but this is a bit of

      an unusual case because the change landed roughly a year before

      that.   </p>

    <p>This patch introduced a profile driven heuristic to selectively

      disable hoisting of instructions out of loops.  By doing so, it

      changes a long standing design element without broad consensus

      following discussion on llvm-dev.  However, this email isn't

      really about the revert per se.  </p>

    <p>In the course of the discussion leading to this point, I realized

      we didn't really have a cite-able resource describing the

      historical design.  This email is an attempt to provide that, and

      to highlight some of the issues which need addressed if we do

      decide we want to change it.<br>

    </p>

    <p><b>LICM as canonical form</b></p>

    <p>We have for many years treated hoisting instructions out of loops

      as a canonical form.  That is, hoisting is not done because it is

      profitable (though it often is), but is instead done so that other

      parts of the optimizer can rely on it.  <br>

    </p>

    <p>We assume that an unprofitable hoist will be undone. 

      Historically, we have generally assumed this to be done in the

      backend, but more recently, LoopSink has also been added towards

      the end of the IR pipeline with the same goal.</p>

    <p><b>Why does this matter?</b></p>

    <p>Other transforms depend on us having hoisted instructions out of

      loops for effectiveness.  The largest source of such assumptions

      is that SCEV is unable to compute trip counts for any exit

      condition involving a loop varying load.  Almost all of our loop

      transformations depend on SCEVs trip count logic, so failing to

      hoist an otherwise hoistable load is a severe pessimization.  </p>

    <p>For illustration purposes, consider this toy example:</p>

    <p>  for (int i = 0; ; i++) {<br>

          sum += a[i] + *b;<br>

          length = a.length;<br>

          if (i >= length) break;<br>

        }</p>

    <p>This example involves a typical for-loop for which the exit test

      depends on a loop varying load.<br>

    </p>

    <p>Here's a couple examples:</p>

    <ol>

      <li>In unrolling, the form above is not unrollable.  The trip

        count is unknowable.  We might be able to use profile

        information to do a bounded full unroll if this loop is short

        running, but all other forms of unrolling (exact full, partial,

        and runtime) will be impossible.</li>

      <li>In the vectorizer, we will be unable to establish a trip

        count, and thus will not vectorize.  Additionally, even if we

        can compute a trip count, the cost model handling for uniformity

        depends on hoisting.  <br>

      </li>

    </ol>

    <p>Other impacts worth noting<br>

    </p>

    <ol>

      <li>In loop idiom recognize, we will fail to recognize most

        counted idioms (e.g. popcount, cttz).  Additionally, things like

        memset recognition will not happen if the value being stored was

        hoistable, but not hoisted.</li>

      <li>Our ability to analyze dominating conditions (e.g. cvp,

        valuetracking, SCEV's isKnownPredicateAt) will all be crippled

        by the inability to recognize values are loop invariant.  When

        the RHS of a comparison is a potentially different value every

        time it runs, it really limits our ability to derive useful

        knowledge from that comparison or cross correlate comparisons. 

        <br>

      </li>

    </ol>

    <p><b>But what about an unprofitable hoist?</b></p>

    <p>There are examples where hoisting is not profitable.  Here's one

      such example:</p>

    <p>  for (int i = 0; i < N; i++) {<br>

          if (dynamically_always_taken) continue;<br>

          sum += a[i];<br>

          length = a.length;<br>

          if (i >= length) break;<br>

        }<br>

    </p>

    <p>Our general posture has been that we will perform hoisting in the

      middle end, and then undo that hoisting if needed later in the

      pass pipeline.  The basic reason for that is that it is nearly

      impossible to distinguish profitable from unprofitable cases

      because the profitability of the transform depends too heavily on

      which following transforms might run.</p>

    <p>Here's a small example which might at first seem unprofitable -

      inspired by the patch being reverted - but where hoisting is in

      fact the far more profitable outcome.</p>

    <p>  for (int i = 0; i < N; i++) {<br>

          i8* addr = a;<br>

          if (invariant_cond_usually_false) {<br>

             // very, very rare block e.g. 1 in 100 million<br>

             addr = a + 1;<br>

          }<br>

          *addr = 0<br>

        }</p>

    <p>Subtly, this example should be profitable to hoist even if the

      rare condition is not invariant.  We still know this loop writes

      to at most two memory locations.  While we might not exploit that

      fact today, an extended form of load-store promotion could do so. 

      If we don't hoist the addressing expression under the rare branch,

      we can not (in general) determine that at most two locations are

      written.  <br>

    </p>

    <p>I will note that LoopSink appears to be a bit restricted in

      practice.  Someone with unprofitable examples could reasonably

      push this much further.  <br>

    </p>

    <p><b>How would we change this?</b></p>

    <p>I want to be very explicit about saying this design is only one

      reasonable design.  It would also be entirely reasonable to build

      a design around a profit driven LICM.  That's simply not what we

      have today.  The remainder of this section is about expanding on

      the work which would need to be undertaken to make such a change.</p>

    <p>First, we would need a clear set of examples where LICM was truly

      unprofitable.  These examples would need to be publicly

      accessible.  They would also need to be fairly minimal.  In

      particular, there must not be other obvious optimizations which if

      implemented makes the hoisting profitable after all.  <br>

    </p>

    <p>Second, we would need a proposal to llvm-dev which directly

      engages with the fact that SCEV (and thus most of our loop passes)

      depends on having loads hoisted for analysis quality.  We could

      build a mechanism in SCEV to model possible load hoisting.  There

      are some tricky bits in doing so, but it should be possible in

      theory.</p>

    <p>The main problem with modeling possible hoists in SCEV is the

      need to query both memory analysis and fault legality

      efficiently.  Figuring out how to make that available for all

      users of SCEV without introducing nasty invalidation bugs or

      degenerate compile times is a hard design problem, but might be

      feasible.  (I think that MemorySSA gives an interesting building

      block here, but have not deeply considered this.)  <br>

    </p>

    <p>There's also an API design problem in making sure that analysis

      results can't be consumed without committing to the hoisting in

      the IR.  A transform which e.g. assumes a trip count without

      hoisting the relevant load would be subtly incorrect.  <br>

    </p>

    <p>Third, a consensus must be built that the resulting additional

      complexity for the mechanism build to address the previous point

      is worthwhile for the project as a whole.  This will be a

      judgement call and would depend heavily on the solution chosen for

      the previous point.</p>

    <p>Finally, to be explicit, I am using the SCEV use case by way of

      an example only.  There may be other ways that we depend on

      hoisting as a canonical form that I did not happen to think of

      when writing this email.  The burden is on the person or person

      proposing a change to identify any other dependencies, and to

      convince the community they have done so.  Discussion of a testing

      strategy to find those dependencies should be a first class

      concern in any proposal.  <br>

    </p>

    <p>Philip<br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

  </body>

</html>