[llvm-dev] Question about Unrolling Loop with Multiple Exits

Jingu Kang via llvm-dev llvm-dev at lists.llvm.org
Thu Aug 19 07:13:35 PDT 2021


Hi Philip,

I can see upstream llvm unrolls the loop in the test case with the command line options.

I really appreciate your patch!

Thanks,
JinGu Kang

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 03 August 2021 19:30
To: Jingu Kang <Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits


JFYI, https://reviews.llvm.org/D107381 works in this direction, and the reduced test case does unroll with that patch and the following command line.

$ opt -loop-unroll� -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug -unroll-runtime-multi-exit -unroll-runtime-epilog
On 7/20/21 10:17 AM, Philip Reames wrote:
./opt -loop-unroll� -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug
Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug
Loop Unroll: F[test] Loop %while.body.us
� Loop Size = 10
� runtime unrolling with count: 8
� Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1
� Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1, BreakoutTrip=1
Trying runtime unrolling on Loop:
Loop at depth 1 containing: %while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting>
Using prolog remainder.
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
Won't unroll; remainder loop could not be generated when assuming runtime trip count
; ModuleID = '<stdin>'
source_filename = "<stdin>"

declare i32 @foo(i8*)

define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us

while.body.us:����������������������������������� ; preds = %while.cond.backedge, %entry
� %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.cond.backedge

if.end8.us:�������������������������������������� ; preds = %while.body.us
� %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit

while.cond.backedge:����������������������������� ; preds = %if.end8.us, %while.body.us
� %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us

return.loopexit:��������������������������������� ; preds = %while.cond.backedge, %if.end8.us
� ret void
}

The debug output appears to give a pretty good clue as to where you could start if desired.

Philip
On 7/20/21 9:45 AM, Jingu Kang wrote:
Hi Philip,
�
I have reduced the test case roughly as below.
�
declare i32 @foo(i8 *)
�
define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us
�
while.body.us:���
��%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
�
if.end8.us:���
��%call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
�
while.cond.backedge:
� %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us
�
return.loopexit:���
��ret void
}
�
Roughly, I unrolled the loop manually 2 times as below and ignored the remaining loop simply.
�
define void @test(i8* %s, i64 %a) {
entry:
� %s.addr.a = getelementptr i8, i8* %s, i64 %a
� br label %while.body.us
�
while.body.us:���
��%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
� %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
� %incdec.val = load i8, i8* %s.addr, align 1
� %cmp1 = icmp eq i8 %incdec.val, 10
� br i1 %cmp1, label %if.end8.us, label %while.body.us.1
�
while.body.us.1:���
��%incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1
� %incdec.val.1 = load i8, i8* %incdec.ptr, align 1
� %cmp1.1 = icmp eq i8 %incdec.val.1, 10
� br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge
�
if.end8.us:���
��%incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ %incdec.ptr.1, %while.body.us.1 ]
� %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi)
� %cmp2 = icmp ult i32 %call9, 0
� br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
�
while.cond.backedge:
� %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a
� br i1 %cmp3, label %return.loopexit, label %while.body.us
�
return.loopexit:���
��ret void
}
�
If possible, can we make loop unroll pass handle this kind of cases please?
If I missed something, please let me know.
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 20:12
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�

Jingu,

I'm not fluent in AArch64 assembly, sorry.� If you want more meaningful commentary, you'll need to reduce your c++ example further, and give a better explanation of what you expect the unroller to do for you.�

Philip
On 7/16/21 12:09 PM, Jingu Kang wrote:
Sorry for poor example�
�
The AArch64 assembly output of the example from gcc is as below. The loop is unrolled 7 times. I have written some comments to explain how the assembly code is mapped to C source code. As you can see on `.L3` label, the �if (*s++ != '\n')� block is unrolled 7 times.
�
������� stp���� x29, x30, [sp, -48]!
������� mov���� x29, sp
������� stp���� x19, x20, [sp, 16]
������� mov���� x19, x0� --> x19 is char *s
������� mov���� x20, x1� --> x20 is char *end
������� str���� x21, [sp, 32]
������� orr���� x21, x3, x2� --> x21 is check1 | check2
.L70:
������� sub���� x0, x20, x19� --> x0 = end - s;
������� add���� x1, x0, 1
������� ands��� x2, x1, 7� --> unroll count is 7
������� beq���� .L3���� --> .L3 is inside while loop. if (*s++ != '\n')
������� cmp���� x19, x20 �--> while(s <= end)
������� bhi���� .L2���� --> .L2 is label ret1.
�
������� ldrb��� w3, [x19], 1� --> start of remainder
������� cmp���� w3, 10
������� beq���� .L71
������� cmp���� x2, 1
������� beq���� .L3
������� cmp���� x2, 2
������� beq���� .L49
������� cmp���� x2, 3
������� beq���� .L50
������� cmp���� x2, 4
������� beq���� .L51
������� cmp���� x2, 5
������� beq���� .L52
������� cmp���� x2, 6
������� beq���� .L53
������� ldrb��� w4, [x19], 1
������� cmp���� w4, 10
������� bne���� .L53
.L71:
������� cbz���� x21, .L4� --> if(check1 || check2)
������� bl����� foo()
������� mov���� x19, x0
������� cbz���� x0, .L2�� --> if (!s) goto ret1;
�
.L4:��������������������� --> if(boo(s)) goto ret0;
������� mov���� x0, x19
����� ��bl����� boo(char*)
������� cbz���� w0, .L70
������� mov���� w0, 0
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
.L53:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w5, [x19], 1
������� cmp���� w5, 10
������� beq���� .L71
.L52:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w6, [x19], 1
������� cmp���� w6, 10
������� beq���� .L71
.L51:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w7, [x19], 1
������� cmp���� w7, 10
������� beq���� .L71
.L50:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w8, [x19], 1
������� cmp���� w8, 10
������� beq���� .L71
.L49:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w9, [x19], 1
������� cmp���� w9, 10
������� beq���� .L71
.L3:������������������������� --> if (*s++ != '\n'), 7 times unrolled
������� cmp���� x19, x20
������� bhi���� .L2
������� ldrb��� w10, [x19]
������� add�� ��x19, x19, 1
������� mov���� x11, x19
������� cmp���� w10, 10
������� beq���� .L71
������� ldrb��� w12, [x19], 1
������� cmp���� w12, 10
������� beq���� .L71
������� ldrb��� w13, [x11, 1]
������� add���� x19, x11, 2
������� cmp���� w13, 10
������� beq��� �.L71
������� ldrb��� w14, [x11, 2]
������� add���� x19, x11, 3
������� cmp���� w14, 10
������� beq���� .L71
������� ldrb��� w15, [x11, 3]
������� add���� x19, x11, 4
������� cmp���� w15, 10
������� beq���� .L71
������� ldrb��� w16, [x11, 4]
������� add���� x19, x11, 5
������� cmp���� w16, 10
������� beq���� .L71
������� ldrb��� w17, [x11, 5]
������� add���� x19, x11, 6
������� cmp���� w17, 10
������� beq���� .L71
������� ldrb��� w18, [x11, 6]
������� add���� x19, x11, 7
������� cmp���� w18, 10
������� beq���� .L71
������� b������ .L3
.L2:��������������������� --> label ret1
������� mov���� w0, 1
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
�
I am sorry for showing you assembly output directly� It looks like the rtl level�s unroll pass of gcc unrolls above loop and I need to check it next week. Once I have clear idea about above unrolling from gcc, let me reduce the example more and let you know.
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 18:27
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�

�
On 7/16/21 9:29 AM, Jingu Kang wrote:

Hi Philip,

Thanks for your kind reply.

> A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.

Yep, I am investigating benchmarks with llvm tip and I can see the llvm fails to unroll some loops with multiple exits.
> B) One of your exits does not dominate your latch.� Those are generally hard
> C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�
�
I am looking at perlbench recently and it has `goto` statements inside loop. The example is a reduced case.
Right, but the gotos aren't relevant for your reduced test.� You can reduce further.



When I look at the gcc�s output of the example, it looks like gcc unrolls only the below `if` statement block�
�
��� if (*s++ != '\n')
����� continue;
Your phrasing here does not parse for me.� Can you restate this with different wording and maybe a fully worked example?



�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev
Sent: 16 July 2021 15:52
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�

A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.

B) One of your exits does not dominate your latch.� Those are generally hard.�

C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�

Philip
On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote:
Hi All,
�
While I am investigating benchmarks, I have found loops which llvm fails to unroll because the loops have multiple exits.
For example,
�
char *foo(void);
int boo(char *s);
�
int test(char *s, char *end, char *check1, char *check2) {
� while (s <= end) {
��� if (*s++ != '\n')
����� continue;
��� if (check1 || check2) {
����� s = foo();
����� if (!s)
������� goto ret1;
��� }��
����if (boo(s))
����� goto ret0;
� }
� goto ret1;
�
ret0:
� return 0;
ret1:
� return 1;
}
�
Above code causes below messages from LoopUnroll pass.
�
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
�
I can see the option `unroll-runtime-multi-exit` and comments about it. I wonder there are already reviews for the work on phabriactor or some people are working on it.
If someone knows information about it, please share it.
�
Thanks
JinGu Kang
�
�
�
�
�
�






_______________________________________________

LLVM Developers mailing list

llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210819/514839b9/attachment-0001.html>


More information about the llvm-dev mailing list