[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Wed May 12 11:45:52 PDT 2021
On 5/11/21 7:41 PM, Jie He wrote:
> yes, but current lowering deopt implementation would generate a
> statepoint IR which currently only supports X86-64, as mentioned in GC
> documentation in LLVM.
I believe this is supported on at least AArch64 if memory serves.
>
> iRCE doesn't reply on GCed language, I remember wrong. but it's not
> smart right now, can't handle bounds check well like java RCE did.
Er, I think you're either misunderstanding or need to clarify your
point. IRCE does exactly the standard pre/main/post loop technique
which was used in C2 back in the day. LoopPred does the widening
transformation. Do you have a particular case in mind you're thinking of?
>
> On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com
> <mailto:listmail at philipreames.com>> wrote:
>
> This is incorrect.
>
> IRCE's current sole known user happens to be a compiler for a GCed
> language, but there is no (intentional) dependence on that fact.
> It should work on arbitrary IR.
>
> Loop predication (the form in IndVars) triggers for arbitrary IR.
> The separate pass depends on semantics of guards which is related
> to deopt semantics, but *not* GC.
>
> Philip
>
> On 5/11/21 7:17 AM, Jie He wrote:
>> as I know, current IRCE implementation relies on some
>> preconditions. it's intended to language runtime with garbage
>> collection, not for loop vectorization.
>> the same is true for loop predication, which is also helpful for
>> eliminating condition check within a loop.
>>
>> Jie He
>> B.R
>>
>> On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hi Philip,
>>
>> I have extended your suggestion slightly more as below.
>>
>> newbound1 = min(n, c)
>>
>> newbound2 = max(n, c)
>>
>> while (iv < n) { while(iv < newbound1) {
>>
>> A A
>>
>> if (iv < c) B
>>
>> B C
>>
>> C }
>>
>> } iv = newbound1
>>
>> while (iv < newbound2) {
>>
>> A
>>
>> C
>>
>> }
>>
>> I have implemented a simple pass to split bound of loop,
>> which has conditional branch with IV, as above example.
>> https://reviews.llvm.org/D102234
>> <https://reviews.llvm.org/D102234> It is initial version. If
>> possible, please review it.
>>
>> Thanks
>>
>> JinGu Kang
>>
>> *From:* Jingu Kang <Jingu.Kang at arm.com
>> <mailto:Jingu.Kang at arm.com>>
>> *Sent:* 04 May 2021 12:45
>> *To:* Philip Reames <listmail at philipreames.com
>> <mailto:listmail at philipreames.com>>; Jingu Kang
>> <Jingu.Kang at arm.com <mailto:Jingu.Kang at arm.com>>
>> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding
>> something similar in the pipeline of new pass manager
>>
>> Philip, I appreciate your kind comments.
>>
>> >In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead,
>> we could simply restrict the loop bounds in the following manner:
>>
>> >loop.ph <http://loop.ph>:
>>
>> > ;; Warning: psuedo code, might have edge conditions wrong
>>
>> > %c = icmp sgt %iv, %n
>>
>> > %min = umax(%n, %a)
>>
>> > br i1 %c, label %exit, label %loop.ph <http://loop.ph>
>>
>> >
>>
>> >loop.ph.split:
>>
>> > br label %loop
>>
>> >
>>
>> >loop:
>>
>> > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph <http://loop.ph> ]
>>
>> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>
>> > %val = load i64, i64* %src.arrayidx
>>
>> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>
>> > store i64 %val, i64* %dst.arrayidx
>>
>> > %inc = add nuw nsw i64 %iv, 1
>>
>> > %cond = icmp eq i64 %inc, %min
>>
>> > br i1 %cond, label %exit, label %loop
>>
>> >
>>
>> >exit:
>>
>> > ret void
>>
>> >}
>>
>> >
>>
>> >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting
>>
>> I agree with you. If the llvm community is ok to accept above
>> approach as a pass or a part of a certain pass, I would be
>> happy to implement it because I am aiming to handle this case
>> with llvm upstream.
>>
>> >Another way to frame this special case might be to recognize the conditional block can be
>> inverted into an early exit. (Reasoning: %iv is strictly
>> increasing, condition is monotonic, path if not taken has no
>> observable effect) Consider:
>>
>> >loop.ph <http://loop.ph>:
>>
>> > br label %loop
>>
>> >
>>
>> >loop:
>>
>> > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph <http://loop.ph> ]
>>
>> > %cmp = icmp sge i64 %iv, %a
>>
>> > br i1 %cmp, label %exit, label %for.inc
>>
>> >
>>
>> >for.inc:
>>
>> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>
>> > %val = load i64, i64* %src.arrayidx
>>
>> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>
>> > store i64 %val, i64* %dst.arrayidx
>>
>> > %inc = add nuw nsw i64 %iv, 1
>>
>> > %cond = icmp eq i64 %inc, %n
>>
>> > br i1 %cond, label %exit, label %loop
>>
>> >
>>
>> >exit:
>>
>> > ret void
>>
>> >}
>>
>> >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking
>> about it, I really like this variant.
>>
>> I have not looked at the multiple exit vectorization work
>> yet but it looks we could consider the inverted condition as
>> early exit’s condition.
>>
>> >The costing here seems quite off. I have not looked at how the vectorize costs predicated loads
>> on hardware without predication, but needing to scalarize a
>> conditional VF-times and form a vector again does not have a
>> cost of 3 million. This could definitely be improved.
>>
>> I agree with you.
>>
>> Additionally, if possible, I would like to suggest to enable
>> or add transformations in order to help vectorization. For
>> example, as removing conditional branch inside loop, we could
>> split a loop with dependency, which blocks vectorization,
>> into vectorizable loop and non-vectorizable one using
>> transformations like loop distribution. I am not sure why
>> these features have not been enabled as default on pass
>> manager but it would make more loops vectorizable.
>>
>> Thanks
>>
>> JinGu Kang
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>> --
>> Best Regards
>> He Jie 何杰
>
>
>
> --
> Best Regards
> He Jie 何杰
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/99e580c4/attachment.html>
More information about the llvm-dev
mailing list