<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 03/14/2017 12:11 PM, Adam Nemet
      wrote:<br>
    </div>
    <blockquote
      cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Mar 14, 2017, at 9:49 AM, Hal Finkel <<a
              moz-do-not-send="true" href="mailto:hfinkel@anl.gov"
              class="">hfinkel@anl.gov</a>> wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <p class=""><br class="">
              </p>
              <div class="moz-cite-prefix">On 03/14/2017 11:21 AM, Adam
                Nemet wrote:<br class="">
              </div>
              <blockquote
                cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
                type="cite" class=""> <br class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">On Mar 14, 2017, at 6:00 AM, Nema,
                      Ashutosh <<a moz-do-not-send="true"
                        href="mailto:Ashutosh.Nema@amd.com" class="">Ashutosh.Nema@amd.com</a>>
                      wrote:</div>
                    <br class="Apple-interchange-newline">
                    <div class="">
                      <div class="WordSection1" style="page:
                        WordSection1; font-family: Helvetica; font-size:
                        10px; font-style: normal; font-variant-caps:
                        normal; font-weight: normal; letter-spacing:
                        normal; orphans: auto; text-align: start;
                        text-indent: 0px; text-transform: none;
                        white-space: normal; widows: auto; word-spacing:
                        0px; -webkit-text-stroke-width: 0px;">
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Summarizing the discussion on the
                          implementation approaches.<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Discussed about two approaches, first
                          running ‘InnerLoopVectorizer’ again on the
                          epilog loop immediately after vectorizing the
                          original loop within the same vectorization
                          pass, the second approach where re-running
                          vectorization pass and limiting vectorization
                          factor of epilog loop by metadata.<o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><Approach-2><o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Challenges with re-running the
                          vectorizer pass:<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif; text-indent: -0.25in;" class=""><span
                            class="">1)<span style="font-style: normal;
                              font-variant-caps: normal; font-weight:
                              normal; font-size: 7pt; line-height:
                              normal; font-family: 'Times New Roman';"
                              class="">     <span
                                class="Apple-converted-space"> </span></span></span>Reusing
                          alias check result:<span
                            class="Apple-converted-space"> </span><o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif;" class="">When vectorizer pass
                          runs again it finds the epilog loop as a new
                          loop and it may generates alias check, this
                          new alias check may overkill the gains of
                          epilog vectorization.<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif;" class="">We should use the
                          already computed alias check result instead of
                          re computing again.<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif; text-indent: -0.25in;" class=""><span
                            class="">2)<span style="font-style: normal;
                              font-variant-caps: normal; font-weight:
                              normal; font-size: 7pt; line-height:
                              normal; font-family: 'Times New Roman';"
                              class="">     <span
                                class="Apple-converted-space"> </span></span></span>Rerun
                          the vectorizer and hoist the new alias check:<o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif;" class="">It’s not possible to
                          hoist alias checks as its not fully redundant
                          (not dominated by other checks), it’s not
                          getting execute in all paths.<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><span
                            id="cid:part2.8A601F10.7A15555C@anl.gov"><Mail
                            Attachment.jpeg></span></div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  <div class=""><br class="">
                  </div>
                  <div class="">
                    <div class="">I don’t understand. Looks like you
                      have the same alias checks for the epilog loop
                      too.  How is this CFG different from the
                      re-vectorization of the scalar loop? </div>
                  </div>
                </div>
              </blockquote>
              <br class="">
              You're looking at the wrong thing. This *is* the image
              from re-vectorization. The other image (linked below in
              step (3)) shows the other option.<br class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        Ah ok, the numbering confused me here.</div>
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class=""> <br
                class="">
              <blockquote
                cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
                type="cite" class="">
                <div class="">
                  <div class="">
                    <div class=""> Would be good to have both CFGs here
                      and highlighting the difference.</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">I thought that the whole point was
                      that *if* you reached the epilog vector loop via
                      the first vector loop, you want to bypass the
                      alias checks before the epilog vector.</div>
                  </div>
                </div>
              </blockquote>
              <br class="">
              Yes, but, that's not quite true now. You can also reach
              the epilogue loop if you fail the min-trip-count check,
              and so you don't know anything about the aliasing checks.<br
                class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>OK, so we want this loops to be handled specially.  We
          effectively say that we only vectorize this loop if it does
          not require any alias checks or if the alias checks can be
          predicate-forwarded to this loop from existing checks.</div>
        <div><br class="">
        </div>
        <div>This still seems like an orthogonal issue that may be
          interesting to solve independently.  In other words this could
          be a nice feature in the vectorizer anyway: the loop is
          estimated to be low-trip count so feel free to predicate the
          new vector loop so that the alias check result could be reused
          from some other block.  We obviously don’t have this
          capability today but it’s something that could be nice aside
          from the vectorizer.</div>
      </div>
    </blockquote>
    <br>
    That sounds great. I'm not sure, however, exactly what this means.
    This "predicate forwarding" sounds like a non-local restructuring of
    the surrounding code (because the predicates aren't known to be true
    at that point, we need to search different predecessors to find one
    in which the conditions might be true and insert the vector loop
    across that predecessor edge instead). Maybe we could do this, for
    example, by calling SE->isKnownPredicate enhanced with some
    additional context sensitivity because we currently check dominating
    conditional branches for things that are AddRecs in some loop?
    Moreover, we then have the problem of restructuring the order of the
    trip-count checks (because we need to fast fail to the scalar loop
    for the smallest trip counts). Maybe we can do this the same way?
    This means finding a dominating check on the trip count that implies
    the check we're about to insert, change that check to the check we
    want, keeping the original only on the true side of the trip-count
    check (i.e. if the trip count is larger than the small threshold,
    then check the large threshold).<br>
    <br>
    If we're going to do all of that, then I'd lean toward saying that
    this does not belong in the vectorizer at all. Rather, this seems
    like something we'd want in some general transformation (this seems
    somewhat akin to what JumpThreading does). The general
    transformation seems something like this; the basic vectorized loop
    looks like this:<br>
    <br>
    <tt>int start = 0;</tt><br>
    <tt>if (n >= vf) {<br>
        if (check) {<br>
          for (...; start += vf)<br>
            ...<br>
        }<br>
      }<br>
    </tt><br>
    <tt>for (i = start; i < n; ++i) {<br>
        ...<br>
      }<br>
    </tt><br>
    and after we vectorize the epilogue loop we end up with this:<br>
    <br>
    <tt>int start = 0;</tt><br>
    <tt>if (n >= vf) {<br>
        if (check) {<br>
          for (...; start += vf)<br>
            ...<br>
        }<br>
      }<br>
    </tt><br>
    <tt>if (n >= vf2) {<br>
        if (check) {<br>
          for (...; start += vf2)<br>
            ...<br>
        }<br>
      }<br>
    </tt><br>
    <tt>for (i = start; i < n; ++i) {<br>
        ...<br>
      }</tt><br>
    <br>
    and we need to end up with this:<br>
    <br>
    <tt>int start = 0;<br>
      if (n >= vf2) {<br>
        if (check) {<br>
          if (n >= vf) {<br>
            for (...; start += vf)<br>
              ...<br>
          }<br>
      <br>
          for (...; start += vf2)<br>
            ...<br>
        }<br>
      }<br>
      <br>
      for (i = start; i < n; ++i) {<br>
        ...<br>
      }<br>
    </tt><br>
    where we've recognized here that 'check' is the same in both cases,
    and that because vf2 < vf, the one trip-count check implies the
    other. This latter part seems like the part that our existing passes
    might not know what to do with currently. Thoughts?<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
      cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
      type="cite">
      <div>
        <div><br class="">
        </div>
        <div>Adam</div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class=""> <br
                class="">
               -Hal<br class="">
              <br class="">
              <blockquote
                cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
                type="cite" class="">
                <div class="">
                  <div class="">
                    <div class=""><br class="">
                    </div>
                    <div class="">I still don’t understand why that’s
                      not possible with some sophisticated predicate
                      propagation independent from the vectorizer.  I am
                      not saying it’s already possible but it should be.</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">Adam</div>
                    <div class=""><br class="">
                    </div>
                  </div>
                  <br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div class="WordSection1" style="page:
                        WordSection1; font-family: Helvetica; font-size:
                        10px; font-style: normal; font-variant-caps:
                        normal; font-weight: normal; letter-spacing:
                        normal; orphans: auto; text-align: start;
                        text-indent: 0px; text-transform: none;
                        white-space: normal; widows: auto; word-spacing:
                        0px; -webkit-text-stroke-width: 0px;">
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">NOTE: We cannot prepone alias check
                          as its expensive compared to other checks.<o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><Approach-1><o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif; text-indent: -0.25in;" class=""><span
                            class="">1)<span style="font-style: normal;
                              font-variant-caps: normal; font-weight:
                              normal; font-size: 7pt; line-height:
                              normal; font-family: 'Times New Roman';"
                              class="">     <span
                                class="Apple-converted-space"> </span></span></span>Current
                          patch depends on the existing functionality of
                          LoopVectorizer, it uses ‘InnerLoopVectorizer’
                          again to vectorize the epilog loop, as it
                          happens in the same vectorization pass we have
                          flexibility to reuse already computed alias
                          result check & limit vectorization factor
                          for the epilog loop.<span
                            class="Apple-converted-space"> </span><o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif; text-indent: -0.25in;" class=""><span
                            class="">2)<span style="font-style: normal;
                              font-variant-caps: normal; font-weight:
                              normal; font-size: 7pt; line-height:
                              normal; font-family: 'Times New Roman';"
                              class="">     <span
                                class="Apple-converted-space"> </span></span></span>It
                          does not generate the blocks for new block
                          layout explicitly, rather it depends on
                          ‘InnerLoopVectorizer::createEmptyLoop’ to
                          generate new block layout. The new block
                          layout get automatically generated by calling
                          the ‘InnerLoopVectorizer:: vectorize’ again.<o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif; text-indent: -0.25in;" class=""><span
                            class="">3)<span style="font-style: normal;
                              font-variant-caps: normal; font-weight:
                              normal; font-size: 7pt; line-height:
                              normal; font-family: 'Times New Roman';"
                              class="">     <span
                                class="Apple-converted-space"> </span></span></span>Block
                          layout description with epilog loop
                          vectorization is available at<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt 0.5in;
                          font-size: 11pt; font-family: Calibri,
                          sans-serif;" class=""><a
                            moz-do-not-send="true"
href="https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png"
                            style="color: purple; text-decoration:
                            underline;" class="">https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png</a><o:p
                            class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Approach-1 looks feasible, please
                          comment if any objections.<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Regards,<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class="">Ashutosh<o:p class=""></o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><o:p class=""> </o:p></div>
                        <div style="margin: 0in 0in 0.0001pt; font-size:
                          12pt; font-family: 'Times New Roman', serif;"
                          class=""><span style="font-size: 11pt;
                            font-family: Calibri, sans-serif; color:
                            rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div>
                        ...<br class="">
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>