[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue May 11 08:06:09 PDT 2021
JFYI, the addition of 'A' and 'C' in your example complicates cost
modeling a bunch.
I'll review the patch, just sharing a meta comment here.
Philip
On 5/11/21 5:49 AM, Jingu Kang wrote:
>
> Hi Philip,
>
> I have extended your suggestion slightly more as below.
>
> newbound1 = min(n, c)
>
> newbound2 = max(n, c)
>
> while (iv < n) { while(iv < newbound1) {
>
> A A
>
> if (iv < c) B
>
> B C
>
> C }
>
> } iv = newbound1
>
> while (iv < newbound2) {
>
> A
>
> C
>
> }
>
> I have implemented a simple pass to split bound of loop, which has
> conditional branch with IV, as above example.
> https://reviews.llvm.org/D102234 <https://reviews.llvm.org/D102234> It
> is initial version. If possible, please review it.
>
> Thanks
>
> JinGu Kang
>
> *From:* Jingu Kang <Jingu.Kang at arm.com>
> *Sent:* 04 May 2021 12:45
> *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang
> <Jingu.Kang at arm.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something
> similar in the pipeline of new pass manager
>
> Philip, I appreciate your kind comments.
>
> >In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, we could
> simply restrict the loop bounds in the following manner:
>
> >loop.ph <http://loop.ph>:
> > ;; Warning: psuedo code, might have edge conditions wrong
> > %c = icmp sgt %iv, %n
> > %min = umax(%n, %a)
> > br i1 %c, label %exit, label %loop.ph <http://loop.ph>
> >
> >loop.ph.split:
> > br label %loop
> >
> >loop:
> > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph <http://loop.ph> ]
> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
> > %val = load i64, i64* %src.arrayidx
> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
> > store i64 %val, i64* %dst.arrayidx
> > %inc = add nuw nsw i64 %iv, 1
> > %cond = icmp eq i64 %inc, %min
> > br i1 %cond, label %exit, label %loop
> >
> >exit:
> > ret void
> >}
> >
> >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting
>
> I agree with you. If the llvm community is ok to accept above approach
> as a pass or a part of a certain pass, I would be happy to implement
> it because I am aiming to handle this case with llvm upstream.
>
> >Another way to frame this special case might be to recognize the conditional block can be inverted into an
> early exit. (Reasoning: %iv is strictly increasing, condition is
> monotonic, path if not taken has no observable effect) Consider:
>
> >loop.ph <http://loop.ph>:
> > br label %loop
> >
> >loop:
> > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph <http://loop.ph> ]
> > %cmp = icmp sge i64 %iv, %a
> > br i1 %cmp, label %exit, label %for.inc
> >
> >for.inc:
> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
> > %val = load i64, i64* %src.arrayidx
> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
> > store i64 %val, i64* %dst.arrayidx
> > %inc = add nuw nsw i64 %iv, 1
> > %cond = icmp eq i64 %inc, %n
> > br i1 %cond, label %exit, label %loop
> >
> >exit:
> > ret void
> >}
>
> >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking about it, I
> really like this variant.
>
> I have not looked at the multiple exit vectorization work yet but it
> looks we could consider the inverted condition as early exit’s condition.
>
> >The costing here seems quite off. I have not looked at how the vectorize costs predicated loads on hardware
> without predication, but needing to scalarize a conditional VF-times
> and form a vector again does not have a cost of 3 million. This could
> definitely be improved.
>
> I agree with you.
>
> Additionally, if possible, I would like to suggest to enable or add
> transformations in order to help vectorization. For example, as
> removing conditional branch inside loop, we could split a loop with
> dependency, which blocks vectorization, into vectorizable loop and
> non-vectorizable one using transformations like loop distribution. I
> am not sure why these features have not been enabled as default on
> pass manager but it would make more loops vectorizable.
>
> Thanks
>
> JinGu Kang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210511/4589fe95/attachment-0001.html>
More information about the llvm-dev
mailing list