<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 03/15/2017 02:03 PM, Adam Nemet
      wrote:<br>
    </div>
    <blockquote
      cite="mid:FF0C48A9-BAD5-4430-B6B5-E4DA14E6B731@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Mar 15, 2017, at 9:46 AM, Hal Finkel <<a
              moz-do-not-send="true" href="mailto:hfinkel@anl.gov"
              class="">hfinkel@anl.gov</a>> wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <p class=""><br class="">
              </p>
              <div class="moz-cite-prefix">On 03/14/2017 07:50 PM, Adam
                Nemet wrote:<br class="">
              </div>
              <blockquote
                cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com"
                type="cite" class="">
                <div dir="auto" style="word-wrap: break-word;
                  -webkit-nbsp-mode: space; -webkit-line-break:
                  after-white-space;" class=""><br class="">
                  <div class="">
                    <blockquote type="cite" class="">
                      <div class="">On Mar 14, 2017, at 11:33 AM, Hal
                        Finkel <<a moz-do-not-send="true"
                          href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>>
                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <div bgcolor="#FFFFFF" text="#000000" class="">
                          <p class=""><br class="">
                          </p>
                          <br class="">
                          <div class="moz-cite-prefix">On 03/14/2017
                            12:11 PM, Adam Nemet wrote:<br class="">
                          </div>
                          <blockquote
                            cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
                            type="cite" class=""> <br class="">
                            <div class="">
                              <blockquote type="cite" class="">
                                <div class="">On Mar 14, 2017, at 9:49
                                  AM, Hal Finkel <<a
                                    moz-do-not-send="true"
                                    href="mailto:hfinkel@anl.gov"
                                    class="">hfinkel@anl.gov</a>>
                                  wrote:</div>
                                <br class="Apple-interchange-newline">
                                <div class="">
                                  <div bgcolor="#FFFFFF" text="#000000"
                                    class="">
                                    <p class=""><br class="">
                                    </p>
                                    <div class="moz-cite-prefix">On
                                      03/14/2017 11:21 AM, Adam Nemet
                                      wrote:<br class="">
                                    </div>
                                    <blockquote
                                      cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
                                      type="cite" class=""> <br
                                        class="">
                                      <div class="">
                                        <blockquote type="cite" class="">
                                          <div class="">On Mar 14, 2017,
                                            at 6:00 AM, Nema, Ashutosh
                                            <<a
                                              moz-do-not-send="true"
                                              href="mailto:Ashutosh.Nema@amd.com"
                                              class="">Ashutosh.Nema@amd.com</a>>
                                            wrote:</div>
                                          <br
                                            class="Apple-interchange-newline">
                                          <div class="">
                                            <div class="WordSection1"
                                              style="page: WordSection1;
                                              font-family: Helvetica;
                                              font-size: 10px;
                                              font-style: normal;
                                              font-variant-caps: normal;
                                              font-weight: normal;
                                              letter-spacing: normal;
                                              orphans: auto; text-align:
                                              start; text-indent: 0px;
                                              text-transform: none;
                                              white-space: normal;
                                              widows: auto;
                                              word-spacing: 0px;
                                              -webkit-text-stroke-width:
                                              0px;">
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class="">Summarizing
                                                the discussion on the
                                                implementation
                                                approaches.<o:p class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class=""><o:p
                                                  class=""> </o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class="">Discussed
                                                about two approaches,
                                                first running
                                                ‘InnerLoopVectorizer’
                                                again on the epilog loop
                                                immediately after
                                                vectorizing the original
                                                loop within the same
                                                vectorization pass, the
                                                second approach where
                                                re-running vectorization
                                                pass and limiting
                                                vectorization factor of
                                                epilog loop by metadata.<o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class=""><o:p
                                                  class=""> </o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class=""><Approach-2><o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class="">Challenges
                                                with re-running the
                                                vectorizer pass:<o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt 0.5in;
                                                font-size: 11pt;
                                                font-family: Calibri,
                                                sans-serif; text-indent:
                                                -0.25in;" class=""><span
                                                  class="">1)<span
                                                    style="font-style:
                                                    normal;
                                                    font-variant-caps:
                                                    normal; font-weight:
                                                    normal; font-size:
                                                    7pt; line-height:
                                                    normal; font-family:
                                                    'Times New Roman';"
                                                    class="">     <span
class="Apple-converted-space"> </span></span></span>Reusing alias check
                                                result:<span
                                                  class="Apple-converted-space"> </span><o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt 0.5in;
                                                font-size: 11pt;
                                                font-family: Calibri,
                                                sans-serif;" class="">When
                                                vectorizer pass runs
                                                again it finds the
                                                epilog loop as a new
                                                loop and it may
                                                generates alias check,
                                                this new alias check may
                                                overkill the gains of
                                                epilog vectorization.<o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt 0.5in;
                                                font-size: 11pt;
                                                font-family: Calibri,
                                                sans-serif;" class="">We
                                                should use the already
                                                computed alias check
                                                result instead of re
                                                computing again.<o:p
                                                  class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt 0.5in;
                                                font-size: 11pt;
                                                font-family: Calibri,
                                                sans-serif; text-indent:
                                                -0.25in;" class=""><span
                                                  class="">2)<span
                                                    style="font-style:
                                                    normal;
                                                    font-variant-caps:
                                                    normal; font-weight:
                                                    normal; font-size:
                                                    7pt; line-height:
                                                    normal; font-family:
                                                    'Times New Roman';"
                                                    class="">     <span
class="Apple-converted-space"> </span></span></span>Rerun the vectorizer
                                                and hoist the new alias
                                                check:<o:p class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt 0.5in;
                                                font-size: 11pt;
                                                font-family: Calibri,
                                                sans-serif;" class="">It’s
                                                not possible to hoist
                                                alias checks as its not
                                                fully redundant (not
                                                dominated by other
                                                checks), it’s not
                                                getting execute in all
                                                paths.<o:p class=""></o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class=""><o:p
                                                  class=""> </o:p></div>
                                              <div style="margin: 0in
                                                0in 0.0001pt; font-size:
                                                12pt; font-family:
                                                'Times New Roman',
                                                serif;" class=""><span
                                                  id="cid:part2.8A601F10.7A15555C@anl.gov"
                                                  class=""><Mail
                                                  Attachment.jpeg></span></div>
                                            </div>
                                          </div>
                                        </blockquote>
                                        <div class=""><br class="">
                                        </div>
                                        <div class=""><br class="">
                                        </div>
                                        <div class="">
                                          <div class="">I don’t
                                            understand. Looks like you
                                            have the same alias checks
                                            for the epilog loop too.
                                             How is this CFG different
                                            from the re-vectorization of
                                            the scalar loop? </div>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <br class="">
                                    You're looking at the wrong thing.
                                    This *is* the image from
                                    re-vectorization. The other image
                                    (linked below in step (3)) shows the
                                    other option.<br class="">
                                  </div>
                                </div>
                              </blockquote>
                              <div class=""><br class="">
                              </div>
                              Ah ok, the numbering confused me here.</div>
                            <div class=""><br class="">
                              <blockquote type="cite" class="">
                                <div class="">
                                  <div bgcolor="#FFFFFF" text="#000000"
                                    class=""> <br class="">
                                    <blockquote
                                      cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
                                      type="cite" class="">
                                      <div class="">
                                        <div class="">
                                          <div class=""> Would be good
                                            to have both CFGs here and
                                            highlighting the difference.</div>
                                          <div class=""><br class="">
                                          </div>
                                          <div class="">I thought that
                                            the whole point was that
                                            *if* you reached the epilog
                                            vector loop via the first
                                            vector loop, you want to
                                            bypass the alias checks
                                            before the epilog vector.</div>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <br class="">
                                    Yes, but, that's not quite true now.
                                    You can also reach the epilogue loop
                                    if you fail the min-trip-count
                                    check, and so you don't know
                                    anything about the aliasing checks.<br
                                      class="">
                                  </div>
                                </div>
                              </blockquote>
                              <div class=""><br class="">
                              </div>
                              <div class="">OK, so we want this loops to
                                be handled specially.  We effectively
                                say that we only vectorize this loop if
                                it does not require any alias checks or
                                if the alias checks can be
                                predicate-forwarded to this loop from
                                existing checks.</div>
                              <div class=""><br class="">
                              </div>
                              <div class="">This still seems like an
                                orthogonal issue that may be interesting
                                to solve independently.  In other words
                                this could be a nice feature in the
                                vectorizer anyway: the loop is estimated
                                to be low-trip count so feel free to
                                predicate the new vector loop so that
                                the alias check result could be reused
                                from some other block.  We obviously
                                don’t have this capability today but
                                it’s something that could be nice aside
                                from the vectorizer.</div>
                            </div>
                          </blockquote>
                          <br class="">
                          That sounds great. I'm not sure, however,
                          exactly what this means. This "predicate
                          forwarding" sounds like a non-local
                          restructuring of the surrounding code (because
                          the predicates aren't known to be true at that
                          point, we need to search different
                          predecessors to find one in which the
                          conditions might be true and insert the vector
                          loop across that predecessor edge instead).
                          Maybe we could do this, for example, by
                          calling SE->isKnownPredicate enhanced with
                          some additional context sensitivity because we
                          currently check dominating conditional
                          branches for things that are AddRecs in some
                          loop? Moreover, we then have the problem of
                          restructuring the order of the trip-count
                          checks (because we need to fast fail to the
                          scalar loop for the smallest trip counts).
                          Maybe we can do this the same way? This means
                          finding a dominating check on the trip count
                          that implies the check we're about to insert,
                          change that check to the check we want,
                          keeping the original only on the true side of
                          the trip-count check (i.e. if the trip count
                          is larger than the small threshold, then check
                          the large threshold).<br class="">
                          <br class="">
                          If we're going to do all of that, then I'd
                          lean toward saying that this does not belong
                          in the vectorizer at all. Rather, this seems
                          like something we'd want in some general
                          transformation (this seems somewhat akin to
                          what JumpThreading does). The general
                          transformation seems something like this; the
                          basic vectorized loop looks like this:<br
                            class="">
                          <br class="">
                          <tt class="">int start = 0;</tt><br class="">
                          <tt class="">if (n >= vf) {<br class="">
                              if (check) {<br class="">
                                for (...; start += vf)<br class="">
                                  ...<br class="">
                              }<br class="">
                            }<br class="">
                          </tt><br class="">
                          <tt class="">for (i = start; i < n; ++i) {<br
                              class="">
                              ...<br class="">
                            }<br class="">
                          </tt><br class="">
                          and after we vectorize the epilogue loop we
                          end up with this:<br class="">
                          <br class="">
                          <tt class="">int start = 0;</tt><br class="">
                          <tt class="">if (n >= vf) {<br class="">
                              if (check) {<br class="">
                                for (...; start += vf)<br class="">
                                  ...<br class="">
                              }<br class="">
                            }<br class="">
                          </tt><br class="">
                          <tt class="">if (n >= vf2) {<br class="">
                          </tt></div>
                      </div>
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class="">This is (n - start >= vf2) which I
                      think makes this a bit more complicated since now
                      “check” is not always redundant if n >= vf2 so
                      hoisting becomes a cost decision.</div>
                    <br class="">
                    <blockquote type="cite" class="">
                      <div class="">
                        <div bgcolor="#FFFFFF" text="#000000" class=""><tt
                            class="">   if (check) {<br class="">
                          </tt></div>
                      </div>
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class="">But we can turn this into a predicate
                      we already have: if (n >= vf && check).
                       BTW, this is I think how Ashutosh’ approach 1
                      works too; if the first vector loop does not
                      iterate once the alias result will be false in the
                      epilog vector loop.</div>
                    <br class="">
                    <blockquote type="cite" class="">
                      <div class="">
                        <div bgcolor="#FFFFFF" text="#000000" class=""><tt
                            class="">     for (...; start += vf2)<br
                              class="">
                                  ...<br class="">
                              }<br class="">
                            }<br class="">
                          </tt><br class="">
                          <tt class="">for (i = start; i < n; ++i) {<br
                              class="">
                              ...<br class="">
                            }</tt><br class="">
                          <br class="">
                          and we need to end up with this:<br class="">
                          <br class="">
                          <tt class="">int start = 0;<br class="">
                            if (n >= vf2) {<br class="">
                              if (check) {<br class="">
                                if (n >= vf) {<br class="">
                                  for (...; start += vf)<br class="">
                                    ...<br class="">
                                }<br class="">
                            <br class="">
                                for (...; start += vf2)<br class="">
                                  ...<br class="">
                              }<br class="">
                            }<br class="">
                            <br class="">
                            for (i = start; i < n; ++i) {<br class="">
                              ...<br class="">
                            }<br class="">
                          </tt><br class="">
                          where we've recognized here that 'check' is
                          the same in both cases, and that because vf2
                          < vf, the one trip-count check implies the
                          other. This latter part seems like the part
                          that our existing passes might not know what
                          to do with currently. Thoughts?<br class="">
                        </div>
                      </div>
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">And then we get:</div>
                    <div class=""><tt style="background-color: rgb(255,
                        255, 255);" class=""><br class="">
                      </tt></div>
                    <div class=""><tt style="background-color: rgb(255,
                        255, 255);" class="">int start = 0;</tt></div>
                    <div class=""><font class="" face="monospace"><b
                          class="">bool check_result = false;<br
                            style="background-color: rgb(255, 255,
                            255);" class="">
                        </b></font><tt style="background-color: rgb(255,
                        255, 255);" class="">if (n >= vf) {<br
                          class="">
                          if (check) {</tt></div>
                    <div class=""><tt style="background-color: rgb(255,
                        255, 255);" class="">    <b class="">check_result
                          = true;</b><br class="">
                            for (...; start += vf)<br class="">
                              ...<br class="">
                          }<br class="">
                        }<br class="">
                      </tt><br style="background-color: rgb(255, 255,
                        255);" class="">
                      <tt style="background-color: rgb(255, 255, 255);"
                        class="">if (n - start >= vf2) {<br class="">
                          if (<b class="">check_result</b>) {<br
                          class="">
                            for (...; start += vf2)<br class="">
                              ...<br class="">
                          }<br class="">
                        }<br class="">
                      </tt><br style="background-color: rgb(255, 255,
                        255);" class="">
                      <tt style="background-color: rgb(255, 255, 255);"
                        class="">for (i = start; i < n; ++i) {<br
                          class="">
                          …<br class="">
                        }</tt></div>
                    <div class=""><tt style="background-color: rgb(255,
                        255, 255);" class=""><br class="">
                      </tt></div>
                    <div class="">What do you think?</div>
                  </div>
                </div>
              </blockquote>
              <br class="">
              I think this seems about right. We shouldn't re-compare
              the trip counts in the epilogue loop if the alias check is
              false anyway:<br class="">
              <br class="">
              <div class=""><tt style="background-color: rgb(255, 255,
                  255);" class="">int start = 0;</tt><font class=""
                  face="monospace"><b class=""><br
                      style="background-color: rgb(255, 255, 255);"
                      class="">
                  </b></font><tt style="background-color: rgb(255, 255,
                  255);" class="">if (n >= vf) {<br class="">
                    if (check) {<br class="">
                      for (...; start += vf)<br class="">
                        ...<br class="">
                    } else { goto scalar; }<br class="">
                  } else { goto scalar; }<br class="">
                </tt><br style="background-color: rgb(255, 255, 255);"
                  class="">
                <tt style="background-color: rgb(255, 255, 255);"
                  class="">if (n - start >= vf2) {<br class="">
                    for (...; start += vf2)<br class="">
                    ...<br class="">
                  }<br class="">
                </tt><br class="">
                <tt class="">scalar:</tt><br style="background-color:
                  rgb(255, 255, 255);" class="">
                <tt style="background-color: rgb(255, 255, 255);"
                  class="">for (i = start; i < n; ++i) {<br class="">
                    …<br class="">
                  }</tt></div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>So to summarize, as long as we guard the epilogue loop with
          'n >= vf && check’, we should be able to
          jump-thread/predicate-foward to the vectorized epilog loop and
          remove the checks.</div>
        <div><br class="">
        </div>
        <div>Is this the direction then (rerun vectorizer + enhance/add
          the above general-purpose optimizations) or we’re still
          circling back to handling this with a single run of the
          vectorizer?</div>
        <div><br class="">
        </div>
        <div>I think that the general difficulty of trying to get the
          optimal version of code that has been loop-versioned multiple
          times is that we are interested in transformation that don’t
          necessarily preserve semantics.  I.e. it’s OK to for example
          to fall back to the original loop in more cases since we
          happen to know that the original loop is equivalent to the
          versioned loop.</div>
      </div>
    </blockquote>
    <br>
    I think we've circled back to handling this in a single run of the
    vectorizer for exactly this reason. Doing this after the fact would
    not be semantics preserving.<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
      cite="mid:FF0C48A9-BAD5-4430-B6B5-E4DA14E6B731@apple.com"
      type="cite">
      <div>
        <div><br class="">
        </div>
        <div>Adam </div>
        <br class="">
      </div>
    </blockquote>
    ...<br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>