[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

Tue May 11 08:06:09 PDT 2021

JFYI, the addition of 'A' and 'C' in your example complicates cost 
modeling a bunch.

I'll review the patch, just sharing a meta comment here.

Philip

On 5/11/21 5:49 AM, Jingu Kang wrote:
>
> Hi Philip,
>
> I have extended your suggestion slightly more as below.
>
>                                  newbound1 = min(n, c)
>
>                                  newbound2 = max(n, c)
>
>      while (iv < n) { while(iv < newbound1) {
>
>        A                           A
>
>        if (iv < c)                 B
>
>          B                         C
>
>        C                         }
>
>      }                           iv = newbound1
>
>                                  while (iv < newbound2) {
>
>                                    A
>
>                                    C
>
>                                  }
>
> I have implemented a simple pass to split bound of loop, which has 
> conditional branch with IV, as above example. 
> https://reviews.llvm.org/D102234 <https://reviews.llvm.org/D102234> It 
> is initial version. If possible, please review it.
>
> Thanks
>
> JinGu Kang
>
> *From:* Jingu Kang <Jingu.Kang at arm.com>
> *Sent:* 04 May 2021 12:45
> *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang 
> <Jingu.Kang at arm.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something 
> similar in the pipeline of new pass manager
>
> Philip, I appreciate your kind comments.
>
> >In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, we could 
> simply restrict the loop bounds in the following manner:
>
> >loop.ph  <http://loop.ph>:
> >  ;; Warning: psuedo code, might have edge conditions wrong
> >  %c = icmp sgt %iv, %n
> >  %min = umax(%n, %a)
> >  br i1 %c, label %exit, label %loop.ph  <http://loop.ph>
> >
> >loop.ph.split:
> >  br label %loop
> >
> >loop:
> >  %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph  <http://loop.ph>  ]
> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv 
> >  %val = load i64, i64* %src.arrayidx
> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv 
> >  store i64 %val, i64* %dst.arrayidx
> >  %inc = add nuw nsw i64 %iv, 1
> >  %cond = icmp eq i64 %inc, %min
> >  br i1 %cond, label %exit, label %loop
> >
> >exit:
> >  ret void
> >}
> >
> >I'm not quite sure what to call this transform, but it's not IRCE.  If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting
>
> I agree with you. If the llvm community is ok to accept above approach 
> as a pass or a part of a certain pass, I would be happy to implement 
> it because I am aiming to handle this case with llvm upstream.
>
> >Another way to frame this special case might be to recognize the conditional block can be inverted into an 
> early exit.  (Reasoning: %iv is strictly increasing, condition is 
> monotonic, path if not taken has no observable effect)  Consider:
>
> >loop.ph  <http://loop.ph>:
> >  br label %loop
> >
> >loop:
> >  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph  <http://loop.ph>  ]
> >  %cmp = icmp sge i64 %iv, %a
> >  br i1 %cmp, label %exit, label %for.inc
> >
> >for.inc:
> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv 
> >  %val = load i64, i64* %src.arrayidx
> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv 
> >  store i64 %val, i64* %dst.arrayidx
> >  %inc = add nuw nsw i64 %iv, 1
> >  %cond = icmp eq i64 %inc, %n
> >  br i1 %cond, label %exit, label %loop
> >
> >exit:
> >  ret void
> >}
>
> >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking about it, I 
> really like this variant.
>
>  I have not looked at the multiple exit vectorization work yet but it 
> looks we could consider the inverted condition as early exit’s condition.
>
> >The costing here seems quite off.  I have not looked at how the vectorize costs predicated loads on hardware 
> without predication, but needing to scalarize a conditional VF-times 
> and form a vector again does not have a cost of 3 million.  This could 
> definitely be improved.
>
> I agree with you.
>
> Additionally, if possible, I would like to suggest to enable or add 
> transformations in order to help vectorization. For example, as 
> removing conditional branch inside loop, we could split a loop with 
> dependency, which blocks vectorization, into vectorizable loop and 
> non-vectorizable one using transformations like loop distribution. I 
> am not sure why these features have not been enabled as default on 
> pass manager but it would make more loops vectorizable.
>
> Thanks
>
> JinGu Kang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210511/4589fe95/attachment-0001.html>