[llvm-dev] Question about Unrolling Loop with Multiple Exits
Jingu Kang via llvm-dev
llvm-dev at lists.llvm.org
Thu Jul 22 04:24:42 PDT 2021
It looks like the runtime unroll does not support the trip count with pointer type…
Before handling the multiple exits, I need to resolve it first for my example…
If possible, can someone let me know that people are working on it or there are reviews for it on phabricator please?
Thanks
JinGu Kang
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jingu Kang via llvm-dev
Sent: 21 July 2021 09:51
To: Philip Reames <listmail at philipreames.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
Thanks for kind suggestion Philip.
Let me have a look at the points where the debug message mentions.
Regards
JinGu Kang
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Philip Reames via llvm-dev
Sent: 20 July 2021 18:17
To: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at arm.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
./opt -loop-unroll� -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug
Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug
Loop Unroll: F[test] Loop %while.body.us
� Loop Size = 10
� runtime unrolling with count: 8
� Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1
� Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1, BreakoutTrip=1
Trying runtime unrolling on Loop:
Loop at depth 1 containing: %while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting>
Using prolog remainder.
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
Won't unroll; remainder loop could not be generated when assuming runtime trip count
; ModuleID = '<stdin>'
source_filename = "<stdin>"
declare i32 @foo(i8*)
define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us
while.body.us:����������������������������������� ; preds = %while.cond.backedge, %entry
� %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
if.end8.us:�������������������������������������� ; preds = %while.body.us
� %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
while.cond.backedge:����������������������������� ; preds = %if.end8.us, %while.body.us
� %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us
return.loopexit:��������������������������������� ; preds = %while.cond.backedge, %if.end8.us
� ret void
}
The debug output appears to give a pretty good clue as to where you could start if desired.
Philip
On 7/20/21 9:45 AM, Jingu Kang wrote:
Hi Philip,
�
I have reduced the test case roughly as below.
�
declare i32 @foo(i8 *)
�
define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us
�
while.body.us:���
��%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
�
if.end8.us:���
��%call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
�
while.cond.backedge:
� %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us
�
return.loopexit:���
��ret void
}
�
Roughly, I unrolled the loop manually 2 times as below and ignored the remaining loop simply.
�
define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us
�
while.body.us:���
��%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.body.us.1
�
while.body.us.1:���
��%incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1
� %incdec.val.1 = load i8, i8* %incdec.ptr, align 1
� %cmp1.1 = icmp eq i8 %incdec.val.1, 10
� br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge
�
if.end8.us:���
��%incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ %incdec.ptr.1, %while.body.us.1 ]
� %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
�
while.cond.backedge:
� %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us
�
return.loopexit:���
��ret void
}
�
If possible, can we make loop unroll pass handle this kind of cases please?
If I missed something, please let me know.
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 20:12
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�
Jingu,
I'm not fluent in AArch64 assembly, sorry.� If you want more meaningful commentary, you'll need to reduce your c++ example further, and give a better explanation of what you expect the unroller to do for you.�
Philip
On 7/16/21 12:09 PM, Jingu Kang wrote:
Sorry for poor example�
�
The AArch64 assembly output of the example from gcc is as below. The loop is unrolled 7 times. I have written some comments to explain how the assembly code is mapped to C source code. As you can see on `.L3` label, the �if (*s++ != '\n')� block is unrolled 7 times.
�
������� stp���� x29, x30, [sp, -48]!
������� mov���� x29, sp
������� stp���� x19, x20, [sp, 16]
������� mov���� x19, x0� --> x19 is char *s
������� mov���� x20, x1� --> x20 is char *end
������� str���� x21, [sp, 32]
������� orr���� x21, x3, x2� --> x21 is check1 | check2
.L70:
������� sub���� x0, x20, x19� --> x0 = end - s;
������� add���� x1, x0, 1
������� ands��� x2, x1, 7� --> unroll count is 7
������� beq���� .L3���� --> .L3 is inside while loop. if (*s++ != '\n')
������� cmp���� x19, x20 �--> while(s <= end)
������� bhi���� .L2���� --> .L2 is label ret1.
�
������� ldrb��� w3, [x19], 1� --> start of remainder
������� cmp���� w3, 10
������� beq���� .L71
������� cmp���� x2, 1
������� beq���� .L3
������� cmp���� x2, 2
������� beq���� .L49
������� cmp���� x2, 3
������� beq���� .L50
������� cmp���� x2, 4
������� beq���� .L51
������� cmp���� x2, 5
������� beq���� .L52
������� cmp���� x2, 6
������� beq���� .L53
������� ldrb��� w4, [x19], 1
������� cmp���� w4, 10
������� bne���� .L53
.L71:
������� cbz���� x21, .L4� --> if(check1 || check2)
������� bl����� foo()
������� mov���� x19, x0
������� cbz���� x0, .L2�� --> if (!s) goto ret1;
�
.L4:��������������������� --> if(boo(s)) goto ret0;
������� mov���� x0, x19
����� ��bl����� boo(char*)
������� cbz���� w0, .L70
������� mov���� w0, 0
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
.L53:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w5, [x19], 1
������� cmp���� w5, 10
������� beq���� .L71
.L52:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w6, [x19], 1
������� cmp���� w6, 10
������� beq���� .L71
.L51:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w7, [x19], 1
������� cmp���� w7, 10
������� beq���� .L71
.L50:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w8, [x19], 1
������� cmp���� w8, 10
������� beq���� .L71
.L49:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w9, [x19], 1
������� cmp���� w9, 10
������� beq���� .L71
.L3:������������������������� --> if (*s++ != '\n'), 7 times unrolled
������� cmp���� x19, x20
������� bhi���� .L2
������� ldrb��� w10, [x19]
������� add�� ��x19, x19, 1
������� mov���� x11, x19
������� cmp���� w10, 10
������� beq���� .L71
������� ldrb��� w12, [x19], 1
������� cmp���� w12, 10
������� beq���� .L71
������� ldrb��� w13, [x11, 1]
������� add���� x19, x11, 2
������� cmp���� w13, 10
������� beq��� �.L71
������� ldrb��� w14, [x11, 2]
������� add���� x19, x11, 3
������� cmp���� w14, 10
������� beq���� .L71
������� ldrb��� w15, [x11, 3]
������� add���� x19, x11, 4
������� cmp���� w15, 10
������� beq���� .L71
������� ldrb��� w16, [x11, 4]
������� add���� x19, x11, 5
������� cmp���� w16, 10
������� beq���� .L71
������� ldrb��� w17, [x11, 5]
������� add���� x19, x11, 6
������� cmp���� w17, 10
������� beq���� .L71
������� ldrb��� w18, [x11, 6]
������� add���� x19, x11, 7
������� cmp���� w18, 10
������� beq���� .L71
������� b������ .L3
.L2:��������������������� --> label ret1
������� mov���� w0, 1
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
�
I am sorry for showing you assembly output directly� It looks like the rtl level�s unroll pass of gcc unrolls above loop and I need to check it next week. Once I have clear idea about above unrolling from gcc, let me reduce the example more and let you know.
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 18:27
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�
�
On 7/16/21 9:29 AM, Jingu Kang wrote:
Hi Philip,
Thanks for your kind reply.
> A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.
Yep, I am investigating benchmarks with llvm tip and I can see the llvm fails to unroll some loops with multiple exits.
> B) One of your exits does not dominate your latch.� Those are generally hard
> C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�
�
I am looking at perlbench recently and it has `goto` statements inside loop. The example is a reduced case.
Right, but the gotos aren't relevant for your reduced test.� You can reduce further.
When I look at the gcc�s output of the example, it looks like gcc unrolls only the below `if` statement block�
�
��� if (*s++ != '\n')
����� continue;
Your phrasing here does not parse for me.� Can you restate this with different wording and maybe a fully worked example?
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 15:52
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�
A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.
B) One of your exits does not dominate your latch.� Those are generally hard.�
C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�
Philip
On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote:
Hi All,
�
While I am investigating benchmarks, I have found loops which llvm fails to unroll because the loops have multiple exits.
For example,
�
char *foo(void);
int boo(char *s);
�
int test(char *s, char *end, char *check1, char *check2) {
� while (s <= end) {
��� if (*s++ != '\n')
����� continue;
��� if (check1 || check2) {
����� s = foo();
����� if (!s)
������� goto ret1;
��� }��
����if (boo(s))
����� goto ret0;
� }
� goto ret1;
�
ret0:
� return 0;
ret1:
� return 1;
}
�
Above code causes below messages from LoopUnroll pass.
�
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
�
I can see the option `unroll-runtime-multi-exit` and comments about it. I wonder there are already reviews for the work on phabriactor or some people are working on it.
If someone knows information about it, please share it.
�
Thanks
JinGu Kang
�
�
�
�
�
�
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210722/3ff70d1b/attachment-0001.html>
More information about the llvm-dev
mailing list