[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

Wed May 12 19:43:53 PDT 2021

hi Philip

yes, I submitted 2 issues about iRCE, 49012 and 49014.
I don't know if I misuse the pass, I have no comprehensive understanding
about this pass and its background. just take some time to dive the code to
find the reason.

On Thu, 13 May 2021 at 02:45, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 5/11/21 7:41 PM, Jie He wrote:
>
> yes, but current lowering deopt implementation would generate a statepoint
> IR which currently only supports X86-64, as mentioned in GC documentation
> in LLVM.
>
> I believe this is supported on at least AArch64 if memory serves.
>
>
> iRCE doesn't reply on GCed language, I remember wrong. but it's not smart
> right now, can't handle bounds check well like java RCE did.
>
> Er, I think you're either misunderstanding or need to clarify your point.
> IRCE does exactly the standard pre/main/post loop technique which was used
> in C2 back in the day.  LoopPred does the widening transformation.  Do you
> have a particular case in mind you're thinking of?
>
>
> On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com>
> wrote:
>
>> This is incorrect.
>>
>> IRCE's current sole known user happens to be a compiler for a GCed
>> language, but there is no (intentional) dependence on that fact.  It should
>> work on arbitrary IR.
>>
>> Loop predication (the form in IndVars) triggers for arbitrary IR.  The
>> separate pass depends on semantics of guards which is related to deopt
>> semantics, but *not* GC.
>>
>> Philip
>> On 5/11/21 7:17 AM, Jie He wrote:
>>
>> as I know, current IRCE implementation relies on some preconditions. it's
>> intended to language runtime with garbage collection, not for loop
>> vectorization.
>> the same is true for loop predication, which is also helpful for
>> eliminating condition check within a loop.
>>
>> Jie He
>> B.R
>>
>> On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi Philip,
>>>
>>>
>>>
>>> I have extended your suggestion slightly more as below.
>>>
>>>
>>>
>>>                                  newbound1 = min(n, c)
>>>
>>>                                  newbound2 = max(n, c)
>>>
>>>      while (iv < n) {            while(iv < newbound1) {
>>>
>>>        A                           A
>>>
>>>        if (iv < c)                 B
>>>
>>>          B                         C
>>>
>>>        C                         }
>>>
>>>      }                           iv = newbound1
>>>
>>>                                  while (iv < newbound2) {
>>>
>>>                                    A
>>>
>>>                                    C
>>>
>>>                                  }
>>>
>>>
>>>
>>> I have implemented a simple pass to split bound of loop, which has
>>> conditional branch with IV, as above example.
>>> https://reviews.llvm.org/D102234 It is initial version. If possible,
>>> please review it.
>>>
>>>
>>>
>>> Thanks
>>>
>>> JinGu Kang
>>>
>>>
>>>
>>> *From:* Jingu Kang <Jingu.Kang at arm.com>
>>> *Sent:* 04 May 2021 12:45
>>> *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang <
>>> Jingu.Kang at arm.com>
>>> *Cc:* llvm-dev at lists.llvm.org
>>> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something
>>> similar in the pipeline of new pass manager
>>>
>>>
>>>
>>> Philip, I appreciate your kind comments.
>>>
>>> >In this example, forming the full pre/main/post loop structure of IRCE
>>> is overkill.  Instead, we could simply restrict the loop bounds in the
>>> following manner:
>>>
>>> >loop.ph:
>>>
>>> >  ;; Warning: psuedo code, might have edge conditions wrong
>>>
>>> >  %c = icmp sgt %iv, %n
>>>
>>> >  %min = umax(%n, %a)
>>>
>>> >  br i1 %c, label %exit, label %loop.ph
>>>
>>> >
>>>
>>> >loop.ph.split:
>>>
>>> >  br label %loop
>>>
>>> >
>>>
>>> >loop:
>>>
>>> >  %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph ]
>>>
>>> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>>
>>> >  %val = load i64, i64* %src.arrayidx
>>>
>>> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>>
>>> >  store i64 %val, i64* %dst.arrayidx
>>>
>>> >  %inc = add nuw nsw i64 %iv, 1
>>>
>>> >  %cond = icmp eq i64 %inc, %min
>>>
>>> >  br i1 %cond, label %exit, label %loop
>>>
>>> >
>>>
>>> >exit:
>>>
>>> >  ret void
>>>
>>> >}
>>>
>>> >
>>>
>>> >I'm not quite sure what to call this transform, but it's not IRCE.  If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting
>>>
>>>
>>>
>>> I agree with you. If the llvm community is ok to accept above approach
>>> as a pass or a part of a certain pass, I would be happy to implement it
>>> because I am aiming to handle this case with llvm upstream.
>>>
>>>
>>>
>>> >Another way to frame this special case might be to recognize the
>>> conditional block can be inverted into an early exit.  (Reasoning: %iv is
>>> strictly increasing, condition is monotonic, path if not taken has no
>>> observable effect)  Consider:
>>>
>>> >loop.ph:
>>>
>>> >  br label %loop
>>>
>>> >
>>>
>>> >loop:
>>>
>>> >  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph ]
>>>
>>> >  %cmp = icmp sge i64 %iv, %a
>>>
>>> >  br i1 %cmp, label %exit, label %for.inc
>>>
>>> >
>>>
>>> >for.inc:
>>>
>>> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>>
>>> >  %val = load i64, i64* %src.arrayidx
>>>
>>> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>>
>>> >  store i64 %val, i64* %dst.arrayidx
>>>
>>> >  %inc = add nuw nsw i64 %iv, 1
>>>
>>> >  %cond = icmp eq i64 %inc, %n
>>>
>>> >  br i1 %cond, label %exit, label %loop
>>>
>>> >
>>>
>>> >exit:
>>>
>>> >  ret void
>>>
>>> >}
>>>
>>> >Once that's done, the multiple exit vectorization work should vectorize
>>> this loop. Thinking about it, I really like this variant.
>>>
>>>  I have not looked at the multiple exit vectorization work yet but it
>>> looks we could consider the inverted condition as early exit’s condition.
>>>
>>> >The costing here seems quite off.  I have not looked at how the
>>> vectorize costs predicated loads on hardware without predication, but
>>> needing to scalarize a conditional VF-times and form a vector again does
>>> not have a cost of 3 million.  This could definitely be improved.
>>>
>>> I agree with you.
>>>
>>>
>>>
>>> Additionally, if possible, I would like to suggest to enable or add
>>> transformations in order to help vectorization. For example, as removing
>>> conditional branch inside loop, we could split a loop with dependency,
>>> which blocks vectorization, into vectorizable loop and non-vectorizable one
>>> using transformations like loop distribution. I am not sure why these
>>> features have not been enabled as default on pass manager but it would make
>>> more loops vectorizable.
>>>
>>>
>>>
>>> Thanks
>>>
>>> JinGu Kang
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> --
>> Best Regards
>> He Jie 何杰
>>
>>
>
> --
> Best Regards
> He Jie 何杰
>
>

-- 
Best Regards
He Jie 何杰
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210513/7a1cfbb9/attachment.html>