<div dir="ltr"><div>hi Philip</div><div><br></div>yes, I submitted 2 issues about iRCE, 49012 and 49014.<div>I don't know if I misuse the pass, I have no comprehensive understanding about this pass and its background. just take some time to dive the code to find the reason.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 13 May 2021 at 02:45, Philip Reames <<a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 5/11/21 7:41 PM, Jie He wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">yes, but current lowering deopt implementation

        would generate a statepoint IR which currently only supports

        X86-64, as mentioned in GC documentation in LLVM.</div>

    </blockquote>

    I believe this is supported on at least AArch64 if memory serves.<br>

    <blockquote type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>iRCE doesn't reply on GCed language, I remember wrong. but

          it's not smart right now, can't handle bounds check well like

          java RCE did.</div>

      </div>

    </blockquote>

    Er, I think you're either misunderstanding or need to clarify your

    point.  IRCE does exactly the standard pre/main/post loop technique

    which was used in C2 back in the day.  LoopPred does the widening

    transformation.  Do you have a particular case in mind you're

    thinking of?<br>

    <blockquote type="cite"><br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Tue, 11 May 2021 at 23:04,

          Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p>This is incorrect.  <br>

            </p>

            <p>IRCE's current sole known user happens to be a compiler

              for a GCed language, but there is no (intentional)

              dependence on that fact.  It should work on arbitrary IR. 

              <br>

            </p>

            <p>Loop predication (the form in IndVars) triggers for

              arbitrary IR.  The separate pass depends on semantics of

              guards which is related to deopt semantics, but *not* GC. 

              <br>

            </p>

            <p>Philip<br>

            </p>

            <div>On 5/11/21 7:17 AM, Jie He wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">as I know, current IRCE implementation

                relies on some preconditions. it's intended to language

                runtime with garbage collection, not for loop

                vectorization.

                <div>the same is true for loop predication, which is

                  also helpful for eliminating condition check within a

                  loop.</div>

                <div><br>

                </div>

                <div>Jie He</div>

                <div>B.R</div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Tue, 11 May 2021 at

                  20:50, Jingu Kang via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div lang="EN-US">

                    <div>

                      <p class="MsoNormal">Hi Philip,</p>

                      <p class="MsoNormal"> </p>

                      <p class="MsoNormal">I have extended your

                        suggestion slightly more as below.</p>

                      <p class="MsoNormal"> </p>

                      <p class="MsoNormal">                                

                        newbound1 = min(n, c)</p>

                      <p class="MsoNormal">                                

                        newbound2 = max(n, c)</p>

                      <p class="MsoNormal">     while (iv < n)

                        {            while(iv < newbound1) {</p>

                      <p class="MsoNormal">      

                        A                           A</p>

                      <p class="MsoNormal">       if (iv <

                        c)                 B</p>

                      <p class="MsoNormal">        

                        B                         C</p>

                      <p class="MsoNormal">      

                        C                         }</p>

                      <p class="MsoNormal">    

                        }                           iv = newbound1</p>

                      <p class="MsoNormal">                                

                        while (iv < newbound2) {</p>

                      <p class="MsoNormal">                                  

                        A</p>

                      <p class="MsoNormal">   

                                                       C</p>

                      <p class="MsoNormal">                                

                        }</p>

                      <p class="MsoNormal"> </p>

                      <p class="MsoNormal">I have implemented a simple

                        pass to split bound of loop, which has

                        conditional branch with IV, as above example. <a href="https://reviews.llvm.org/D102234" target="_blank">https://reviews.llvm.org/D102234</a>

                        It is initial version. If possible, please

                        review it.</p>

                      <p class="MsoNormal"> </p>

                      <p class="MsoNormal">Thanks</p>

                      <p class="MsoNormal">JinGu Kang</p>

                      <p class="MsoNormal"> </p>

                      <div style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt solid blue;padding:0cm 0cm 0cm 4pt">

                        <div>

                          <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm">

                            <p class="MsoNormal"><b>From:</b> Jingu Kang

                              <<a href="mailto:Jingu.Kang@arm.com" target="_blank">Jingu.Kang@arm.com</a>>

                              <br>

                              <b>Sent:</b> 04 May 2021 12:45<br>

                              <b>To:</b> Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>;

                              Jingu Kang <<a href="mailto:Jingu.Kang@arm.com" target="_blank">Jingu.Kang@arm.com</a>><br>

                              <b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                              <b>Subject:</b> RE: [llvm-dev] Enabling

                              IRCE pass or Adding something similar in

                              the pipeline of new pass manager</p>

                          </div>

                        </div>

                        <p class="MsoNormal"> </p>

                        <p class="MsoNormal">Philip, I appreciate your

                          kind comments.</p>

                        <p><span>>In this example, forming the full

                            pre/main/post loop structure of IRCE is

                            overkill.  Instead, we could simply restrict

                            the loop bounds in the following manner:</span></p>

                        <pre>><a href="http://loop.ph" target="_blank">loop.ph</a>:</pre>

                        <pre>>  ;; Warning: psuedo code, might have edge conditions wrong</pre>

                        <pre>>  %c = icmp sgt %iv, %n</pre>

                        <pre>>  %min = umax(%n, %a)</pre>

                        <pre>>  br i1 %c, label %exit, label %<a href="http://loop.ph" target="_blank">loop.ph</a></pre>

                        <pre>> </pre>

                        <pre>>loop.ph.split:</pre>

                        <pre>>  br label %loop</pre>

                        <pre>> </pre>

                        <pre>>loop:</pre>

                        <pre>>  %iv = phi i64 [ %inc, %loop ], [ 1, %<a href="http://loop.ph" target="_blank">loop.ph</a> ]</pre>

                        <pre>>  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>

                        <pre>>  %val = load i64, i64* %src.arrayidx</pre>

                        <pre>>  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>

                        <pre>>  store i64 %val, i64* %dst.arrayidx</pre>

                        <pre>>  %inc = add nuw nsw i64 %iv, 1</pre>

                        <pre>>  %cond = icmp eq i64 %inc, %min</pre>

                        <pre>>  br i1 %cond, label %exit, label %loop</pre>

                        <pre>> </pre>

                        <pre>>exit:</pre>

                        <pre>>  ret void</pre>

                        <pre>>}</pre>

                        <pre>> </pre>

                        <pre>>I'm not quite sure what to call this transform, but it's not IRCE.  If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting</pre>

                        <p class="MsoNormal"> </p>

                        <p class="MsoNormal">I agree with you. If the

                          llvm community is ok to accept above approach

                          as a pass or a part of a certain pass, I would

                          be happy to implement it because I am aiming

                          to handle this case with llvm upstream.</p>

                        <p class="MsoNormal"> </p>

                        <p><span>>Another way to frame this special

                            case might be to recognize the conditional

                            block can be inverted into an early exit. 

                            (Reasoning: %iv is strictly increasing,

                            condition is monotonic, path if not taken

                            has no observable effect)  Consider:</span></p>

                        <pre>><a href="http://loop.ph" target="_blank">loop.ph</a>:</pre>

                        <pre>>  br label %loop</pre>

                        <pre>> </pre>

                        <pre>>loop:</pre>

                        <pre>>  %iv = phi i64 [ %inc, %for.inc ], [ 1, %<a href="http://loop.ph" target="_blank">loop.ph</a> ]</pre>

                        <pre>>  %cmp = icmp sge i64 %iv, %a</pre>

                        <pre>>  br i1 %cmp, label %exit, label %for.inc</pre>

                        <pre>> </pre>

                        <pre>>for.inc:</pre>

                        <pre>>  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>

                        <pre>>  %val = load i64, i64* %src.arrayidx</pre>

                        <pre>>  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>

                        <pre>>  store i64 %val, i64* %dst.arrayidx</pre>

                        <pre>>  %inc = add nuw nsw i64 %iv, 1</pre>

                        <pre>>  %cond = icmp eq i64 %inc, %n</pre>

                        <pre>>  br i1 %cond, label %exit, label %loop</pre>

                        <pre>> </pre>

                        <pre>>exit:</pre>

                        <pre>>  ret void</pre>

                        <pre>>}</pre>

                        <p><span>>Once that's done, the multiple exit

                            vectorization work should vectorize this

                            loop. Thinking about it, I really like this

                            variant.  </span></p>

                        <p class="MsoNormal"> I have not looked at the

                          multiple exit vectorization work yet but it

                          looks we could consider the inverted condition

                          as early exit’s condition.</p>

                        <p><span>>The costing here seems quite off. 

                            I have not looked at how the vectorize costs

                            predicated loads on hardware without

                            predication, but needing to scalarize a

                            conditional VF-times and form a vector again

                            does not have a cost of 3 million.  This

                            could definitely be improved.</span></p>

                        <p class="MsoNormal">I agree with you.</p>

                        <p class="MsoNormal"> </p>

                        <p class="MsoNormal">Additionally, if possible,

                          I would like to suggest to enable or add

                          transformations in order to help

                          vectorization. For example, as removing

                          conditional branch inside loop, we could split

                          a loop with dependency, which blocks

                          vectorization, into vectorizable loop and

                          non-vectorizable one using transformations

                          like loop distribution. I am not sure why

                          these features have not been enabled as

                          default on pass manager but it would make more

                          loops vectorizable.</p>

                        <p class="MsoNormal"> </p>

                        <p class="MsoNormal">Thanks</p>

                        <p class="MsoNormal">JinGu Kang</p>

                      </div>

                    </div>

                  </div>

                  _______________________________________________<br>

                  LLVM Developers mailing list<br>

                  <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                  <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                </blockquote>

              </div>

              <br clear="all">

              <div><br>

              </div>

              -- <br>

              <div dir="ltr">Best Regards<br>

                He Jie 何杰</div>

            </blockquote>

          </div>

        </blockquote>

      </div>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      <div dir="ltr">Best Regards<br>

        He Jie 何杰</div>

    </blockquote>

  </div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Best Regards<br>He Jie 何杰</div>