<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>This is incorrect.  <br>
    </p>
    <p>IRCE's current sole known user happens to be a compiler for a
      GCed language, but there is no (intentional) dependence on that
      fact.  It should work on arbitrary IR.  <br>
    </p>
    <p>Loop predication (the form in IndVars) triggers for arbitrary
      IR.  The separate pass depends on semantics of guards which is
      related to deopt semantics, but *not* GC.  <br>
    </p>
    <p>Philip<br>
    </p>
    <div class="moz-cite-prefix">On 5/11/21 7:17 AM, Jie He wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CABSkmVHR_-=xbxfoGOBxQbToWRXDB0FbAhfdpi5Amx7ZCQfPAw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">as I know, current IRCE implementation relies on
        some preconditions. it's intended to language runtime with
        garbage collection, not for loop vectorization.
        <div>the same is true for loop predication, which is also
          helpful for eliminating condition check within a loop.</div>
        <div><br>
        </div>
        <div>Jie He</div>
        <div>B.R</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, 11 May 2021 at 20:50,
          Jingu Kang via llvm-dev <<a
            href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div style="overflow-wrap: break-word;" lang="EN-US">
            <div class="gmail-m_6473269197180563542WordSection1">
              <p class="MsoNormal">Hi Philip,</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">I have extended your suggestion
                slightly more as below.</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">                                
                newbound1 = min(n, c)</p>
              <p class="MsoNormal">                                
                newbound2 = max(n, c)</p>
              <p class="MsoNormal">     while (iv < n) {           
                while(iv < newbound1) {</p>
              <p class="MsoNormal">       A                           A</p>
              <p class="MsoNormal">       if (iv < c)                
                B</p>
              <p class="MsoNormal">         B                         C</p>
              <p class="MsoNormal">       C                         }</p>
              <p class="MsoNormal">     }                           iv =
                newbound1</p>
              <p class="MsoNormal">                                
                while (iv < newbound2) {</p>
              <p class="MsoNormal">                                   A</p>
              <p class="MsoNormal">                                   C</p>
              <p class="MsoNormal">                                 }</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">I have implemented a simple pass to
                split bound of loop, which has conditional branch with
                IV, as above example.
                <a href="https://reviews.llvm.org/D102234"
                  target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D102234</a>
                It is initial version. If possible, please review it.</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">Thanks</p>
              <p class="MsoNormal">JinGu Kang</p>
              <p class="MsoNormal"> </p>
              <div
style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt
                solid blue;padding:0cm 0cm 0cm 4pt">
                <div>
                  <div
style="border-right:none;border-bottom:none;border-left:none;border-top:1pt
                    solid rgb(225,225,225);padding:3pt 0cm 0cm">
                    <p class="MsoNormal"><b>From:</b> Jingu Kang <<a
                        href="mailto:Jingu.Kang@arm.com" target="_blank"
                        moz-do-not-send="true">Jingu.Kang@arm.com</a>>
                      <br>
                      <b>Sent:</b> 04 May 2021 12:45<br>
                      <b>To:</b> Philip Reames <<a
                        href="mailto:listmail@philipreames.com"
                        target="_blank" moz-do-not-send="true">listmail@philipreames.com</a>>;
                      Jingu Kang <<a href="mailto:Jingu.Kang@arm.com"
                        target="_blank" moz-do-not-send="true">Jingu.Kang@arm.com</a>><br>
                      <b>Cc:</b> <a
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                      <b>Subject:</b> RE: [llvm-dev] Enabling IRCE pass
                      or Adding something similar in the pipeline of new
                      pass manager</p>
                  </div>
                </div>
                <p class="MsoNormal"> </p>
                <p class="MsoNormal">Philip, I appreciate your kind
                  comments.</p>
                <p><span style="font-size:10pt;font-family:"Courier
                    New"">>In this example, forming the full
                    pre/main/post loop structure of IRCE is overkill. 
                    Instead, we could simply restrict the loop bounds in
                    the following manner:</span></p>
                <pre>><a href="http://loop.ph" target="_blank" moz-do-not-send="true">loop.ph</a>:</pre>
                <pre>>  ;; Warning: psuedo code, might have edge conditions wrong</pre>
                <pre>>  %c = icmp sgt %iv, %n</pre>
                <pre>>  %min = umax(%n, %a)</pre>
                <pre>>  br i1 %c, label %exit, label %<a href="http://loop.ph" target="_blank" moz-do-not-send="true">loop.ph</a></pre>
                <pre>> </pre>
                <pre>>loop.ph.split:</pre>
                <pre>>  br label %loop</pre>
                <pre>> </pre>
                <pre>>loop:</pre>
                <pre>>  %iv = phi i64 [ %inc, %loop ], [ 1, %<a href="http://loop.ph" target="_blank" moz-do-not-send="true">loop.ph</a> ]</pre>
                <pre>>  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>
                <pre>>  %val = load i64, i64* %src.arrayidx</pre>
                <pre>>  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>
                <pre>>  store i64 %val, i64* %dst.arrayidx</pre>
                <pre>>  %inc = add nuw nsw i64 %iv, 1</pre>
                <pre>>  %cond = icmp eq i64 %inc, %min</pre>
                <pre>>  br i1 %cond, label %exit, label %loop</pre>
                <pre>> </pre>
                <pre>>exit:</pre>
                <pre>>  ret void</pre>
                <pre>>}</pre>
                <pre>> </pre>
                <pre>>I'm not quite sure what to call this transform, but it's not IRCE.  If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting</pre>
                <p class="MsoNormal"> </p>
                <p class="MsoNormal">I agree with you. If the llvm
                  community is ok to accept above approach as a pass or
                  a part of a certain pass, I would be happy to
                  implement it because I am aiming to handle this case
                  with llvm upstream.</p>
                <p class="MsoNormal"> </p>
                <p><span style="font-size:10pt;font-family:"Courier
                    New"">>Another way to frame this special
                    case might be to recognize the conditional block can
                    be inverted into an early exit.  (Reasoning: %iv is
                    strictly increasing, condition is monotonic, path if
                    not taken has no observable effect)  Consider:</span></p>
                <pre>><a href="http://loop.ph" target="_blank" moz-do-not-send="true">loop.ph</a>:</pre>
                <pre>>  br label %loop</pre>
                <pre>> </pre>
                <pre>>loop:</pre>
                <pre>>  %iv = phi i64 [ %inc, %for.inc ], [ 1, %<a href="http://loop.ph" target="_blank" moz-do-not-send="true">loop.ph</a> ]</pre>
                <pre>>  %cmp = icmp sge i64 %iv, %a</pre>
                <pre>>  br i1 %cmp, label %exit, label %for.inc</pre>
                <pre>> </pre>
                <pre>>for.inc:</pre>
                <pre>>  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>
                <pre>>  %val = load i64, i64* %src.arrayidx</pre>
                <pre>>  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>
                <pre>>  store i64 %val, i64* %dst.arrayidx</pre>
                <pre>>  %inc = add nuw nsw i64 %iv, 1</pre>
                <pre>>  %cond = icmp eq i64 %inc, %n</pre>
                <pre>>  br i1 %cond, label %exit, label %loop</pre>
                <pre>> </pre>
                <pre>>exit:</pre>
                <pre>>  ret void</pre>
                <pre>>}</pre>
                <p><span style="font-size:10pt;font-family:"Courier
                    New"">>Once that's done, the multiple exit
                    vectorization work should vectorize this loop.
                    Thinking about it, I really like this variant. 
                  </span></p>
                <p class="MsoNormal"> I have not looked at the multiple
                  exit vectorization work yet but it looks we could
                  consider the inverted condition as early exit’s
                  condition.</p>
                <p><span style="font-size:10pt;font-family:"Courier
                    New"">>The costing here seems quite off.  I
                    have not looked at how the vectorize costs
                    predicated loads on hardware without predication,
                    but needing to scalarize a conditional VF-times and
                    form a vector again does not have a cost of 3
                    million.  This could definitely be improved.</span></p>
                <p class="MsoNormal">I agree with you.</p>
                <p class="MsoNormal"> </p>
                <p class="MsoNormal">Additionally, if possible, I would
                  like to suggest to enable or add transformations in
                  order to help vectorization. For example, as removing
                  conditional branch inside loop, we could split a loop
                  with dependency, which blocks vectorization, into
                  vectorizable loop and non-vectorizable one using
                  transformations like loop distribution. I am not sure
                  why these features have not been enabled as default on
                  pass manager but it would make more loops
                  vectorizable.</p>
                <p class="MsoNormal"> </p>
                <p class="MsoNormal">Thanks</p>
                <p class="MsoNormal">JinGu Kang</p>
              </div>
            </div>
          </div>
          _______________________________________________<br>
          LLVM Developers mailing list<br>
          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"
            moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
          <a
            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
        </blockquote>
      </div>
      <br clear="all">
      <div><br>
      </div>
      -- <br>
      <div dir="ltr" class="gmail_signature">Best Regards<br>
        He Jie 何杰</div>
    </blockquote>
  </body>
</html>