[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

Wed May 12 11:45:52 PDT 2021

On 5/11/21 7:41 PM, Jie He wrote:
> yes, but current lowering deopt implementation would generate a 
> statepoint IR which currently only supports X86-64, as mentioned in GC 
> documentation in LLVM.
I believe this is supported on at least AArch64 if memory serves.
>
> iRCE doesn't reply on GCed language, I remember wrong. but it's not 
> smart right now, can't handle bounds check well like java RCE did.
Er, I think you're either misunderstanding or need to clarify your 
point.  IRCE does exactly the standard pre/main/post loop technique 
which was used in C2 back in the day.  LoopPred does the widening 
transformation.  Do you have a particular case in mind you're thinking of?
>
> On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>     This is incorrect.
>
>     IRCE's current sole known user happens to be a compiler for a GCed
>     language, but there is no (intentional) dependence on that fact. 
>     It should work on arbitrary IR.
>
>     Loop predication (the form in IndVars) triggers for arbitrary IR. 
>     The separate pass depends on semantics of guards which is related
>     to deopt semantics, but *not* GC.
>
>     Philip
>
>     On 5/11/21 7:17 AM, Jie He wrote:
>>     as I know, current IRCE implementation relies on some
>>     preconditions. it's intended to language runtime with garbage
>>     collection, not for loop vectorization.
>>     the same is true for loop predication, which is also helpful for
>>     eliminating condition check within a loop.
>>
>>     Jie He
>>     B.R
>>
>>     On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>         Hi Philip,
>>
>>         I have extended your suggestion slightly more as below.
>>
>>         newbound1 = min(n, c)
>>
>>         newbound2 = max(n, c)
>>
>>              while (iv < n) {            while(iv < newbound1) {
>>
>>         A                           A
>>
>>                if (iv < c)                 B
>>
>>         B                         C
>>
>>         C                         }
>>
>>         }                           iv = newbound1
>>
>>         while (iv < newbound2) {
>>
>>         A
>>
>>                                        C
>>
>>         }
>>
>>         I have implemented a simple pass to split bound of loop,
>>         which has conditional branch with IV, as above example.
>>         https://reviews.llvm.org/D102234
>>         <https://reviews.llvm.org/D102234> It is initial version. If
>>         possible, please review it.
>>
>>         Thanks
>>
>>         JinGu Kang
>>
>>         *From:* Jingu Kang <Jingu.Kang at arm.com
>>         <mailto:Jingu.Kang at arm.com>>
>>         *Sent:* 04 May 2021 12:45
>>         *To:* Philip Reames <listmail at philipreames.com
>>         <mailto:listmail at philipreames.com>>; Jingu Kang
>>         <Jingu.Kang at arm.com <mailto:Jingu.Kang at arm.com>>
>>         *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding
>>         something similar in the pipeline of new pass manager
>>
>>         Philip, I appreciate your kind comments.
>>
>>         >In this example, forming the full pre/main/post loop structure of IRCE is overkill.  Instead,
>>         we could simply restrict the loop bounds in the following manner:
>>
>>         >loop.ph  <http://loop.ph>:
>>
>>         >  ;; Warning: psuedo code, might have edge conditions wrong
>>
>>         >  %c = icmp sgt %iv, %n
>>
>>         >  %min = umax(%n, %a)
>>
>>         >  br i1 %c, label %exit, label %loop.ph  <http://loop.ph>
>>
>>         > 
>>
>>         >loop.ph.split:
>>
>>         >  br label %loop
>>
>>         > 
>>
>>         >loop:
>>
>>         >  %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph  <http://loop.ph>  ]
>>
>>         >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv 
>>
>>         >  %val = load i64, i64* %src.arrayidx
>>
>>         >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv 
>>
>>         >  store i64 %val, i64* %dst.arrayidx
>>
>>         >  %inc = add nuw nsw i64 %iv, 1
>>
>>         >  %cond = icmp eq i64 %inc, %min
>>
>>         >  br i1 %cond, label %exit, label %loop
>>
>>         > 
>>
>>         >exit:
>>
>>         >  ret void
>>
>>         >}
>>
>>         > 
>>
>>         >I'm not quite sure what to call this transform, but it's not IRCE.  If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting
>>
>>         I agree with you. If the llvm community is ok to accept above
>>         approach as a pass or a part of a certain pass, I would be
>>         happy to implement it because I am aiming to handle this case
>>         with llvm upstream.
>>
>>         >Another way to frame this special case might be to recognize the conditional block can be
>>         inverted into an early exit. (Reasoning: %iv is strictly
>>         increasing, condition is monotonic, path if not taken has no
>>         observable effect)  Consider:
>>
>>         >loop.ph  <http://loop.ph>:
>>
>>         >  br label %loop
>>
>>         > 
>>
>>         >loop:
>>
>>         >  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph  <http://loop.ph>  ]
>>
>>         >  %cmp = icmp sge i64 %iv, %a
>>
>>         >  br i1 %cmp, label %exit, label %for.inc
>>
>>         > 
>>
>>         >for.inc:
>>
>>         >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv 
>>
>>         >  %val = load i64, i64* %src.arrayidx
>>
>>         >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv 
>>
>>         >  store i64 %val, i64* %dst.arrayidx
>>
>>         >  %inc = add nuw nsw i64 %iv, 1
>>
>>         >  %cond = icmp eq i64 %inc, %n
>>
>>         >  br i1 %cond, label %exit, label %loop
>>
>>         > 
>>
>>         >exit:
>>
>>         >  ret void
>>
>>         >}
>>
>>         >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking
>>         about it, I really like this variant.
>>
>>          I have not looked at the multiple exit vectorization work
>>         yet but it looks we could consider the inverted condition as
>>         early exit’s condition.
>>
>>         >The costing here seems quite off. I have not looked at how the vectorize costs predicated loads
>>         on hardware without predication, but needing to scalarize a
>>         conditional VF-times and form a vector again does not have a
>>         cost of 3 million.  This could definitely be improved.
>>
>>         I agree with you.
>>
>>         Additionally, if possible, I would like to suggest to enable
>>         or add transformations in order to help vectorization. For
>>         example, as removing conditional branch inside loop, we could
>>         split a loop with dependency, which blocks vectorization,
>>         into vectorizable loop and non-vectorizable one using
>>         transformations like loop distribution. I am not sure why
>>         these features have not been enabled as default on pass
>>         manager but it would make more loops vectorizable.
>>
>>         Thanks
>>
>>         JinGu Kang
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>>     -- 
>>     Best Regards
>>     He Jie 何杰
>
>
>
> -- 
> Best Regards
> He Jie 何杰
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/99e580c4/attachment.html>