[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Xin Tong via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 13 10:52:09 PST 2017


I am seeing them very different now. Did you try it with and without
the patch ? I am seeing we peel 2X without metadata adjustment and
once with  metadata adjustment .

Thanks,
-Xin

On Fri, Jan 13, 2017 at 10:47 AM, Dehao Chen <danielcdh at gmail.com> wrote:
> Thanks!
>
> I tried the new test case, the assembly is now different, but still seem
> equivalent:
>
> #diff loop-rotate.s ../new/loop-rotate.s
> 47c47
> < .loc 1 0 3 is_stmt 0 discriminator 1 # loop-rotate.c:0:3
> ---
>> .loc 1 16 13                 # loop-rotate.c:16:13
> 51d50
> < .loc 1 16 13 is_stmt 1       # loop-rotate.c:16:13
> 103a103
>> xorl %ebp, %ebp
> 106d105
> < xorl %ebp, %ebp
> 193c192
> < .asciz "/ssd/llvm/build.gccopt/1" # string offset=69
> ---
>> .asciz "/ssd/llvm/build.gccopt/2" # string offset=69
>
> Dehao
>
>
> On Fri, Jan 13, 2017 at 10:38 AM, Xin Tong <trent.xin.tong at gmail.com> wrote:
>>
>> Dehao. I am putting up a new example, the command line to reproduce is
>> the same. The example before suffers from block merging in
>> BranchFolding in downstream in llc. Because there is no
>> simplifications done to peeled block. With this example, we will see
>> that we overpeeled the loop due to not updating the metadata properly
>> after rotation.
>>
>> Thanks
>> -Xin
>>
>> // looprotate.c
>> const int looparr[4] = {1,2,3,4};
>> int g = 0;
>>
>> __attribute__((noinline)) void call_me(int loopvalue) {
>>   g = loopvalue;
>> }
>>
>>
>> // Invoke call_me once when done_call is false.
>> // This is a case where the loop executes once everytime the function
>> // is called.
>>
>> __attribute__((noinline)) void always_call(int max) {
>>   for (int i = 0; i < max; ++i)
>>     call_me(looparr[i]);
>> }
>>
>>
>> // Invoke call_me once when done_call is false and index is odd.
>> // This is a case where the loop does not execute even once half of the
>> // time because of the index.
>> __attribute__((noinline)) void call_on_odd(int max, int index) {
>>   for (int i = 0; i < max && index % 2; ++i)
>>     call_me(looparr[max]);
>> }
>>
>>
>> int main() {
>>   const int max = 1;
>>   for (int i = 0; i < 128; ++i) {
>>     always_call(max);
>>     call_on_odd(max, i);
>>   }
>>   return 0;
>> }
>>
>> On Fri, Jan 13, 2017 at 9:36 AM, Xin Tong <trent.xin.tong at gmail.com>
>> wrote:
>> > On Fri, Jan 13, 2017 at 9:34 AM, Dehao Chen <danielcdh at gmail.com> wrote:
>> >> OK, but the with the new patch, the generated assembly file is
>> >> identical
>> >> with the assembly generated by a clean compiler (without the patch).
>> >>
>> >> Dehao
>> >
>> > Yes, there is something in branchfolding in llc that folds away some
>> > of the branches. I am looking at why.
>> >
>> > -Xin
>> >>
>> >> On Fri, Jan 13, 2017 at 9:21 AM, Xin Tong <trent.xin.tong at gmail.com>
>> >> wrote:
>> >>>
>> >>> On Fri, Jan 13, 2017 at 7:36 AM, Dehao Chen via Phabricator
>> >>> <reviews at reviews.llvm.org> wrote:
>> >>> > danielcdh added a comment.
>> >>> >
>> >>> > Looks like this patch will make the "always call" worse:
>> >>> >
>> >>> > Without this patch:
>> >>> >
>> >>> >   pushq   %rbx
>> >>> >   movq    %rdi, %rbx
>> >>> >   cmpl    $0, (%rbx)
>> >>> >   jne     .LBB1_3
>> >>> >
>> >>> > .LBB1_1:                                # =>This Inner Loop Header:
>> >>> > Depth=1
>> >>> >
>> >>> >   movq    %rbx, %rdi
>> >>> >   callq   call_me
>> >>> >   cmpl    $0, (%rbx)
>> >>> >   je      .LBB1_1
>> >>> >
>> >>> > .LBB1_3:
>> >>> >
>> >>> >   popq    %rbx
>> >>> >   retq
>> >>> >
>> >>> > With this patch:
>> >>> >
>> >>> >   pushq   %rbx
>> >>> >   movq    %rdi, %rbx
>> >>> >   cmpl    $0, (%rbx)
>> >>> >   je      .LBB1_1
>> >>> >
>> >>> > .LBB1_3:
>> >>> >
>> >>> >   popq    %rbx
>> >>> >   retq
>> >>> >
>> >>> > .LBB1_1:                                # =>This Inner Loop Header:
>> >>> > Depth=1
>> >>> >
>> >>> >   movq    %rbx, %rdi
>> >>> >   callq   call_me
>> >>> >   cmpl    $0, (%rbx)
>> >>> >   jne     .LBB1_3
>> >>> >   jmp     .LBB1_1
>> >>> >
>> >>> > As the trip count of this loop is always 1, the first code will have
>> >>> > no
>> >>> > taken branches, while with this patch, it will have 2 taken
>> >>> > branches.
>> >>> >
>> >>> >
>> >>> > https://reviews.llvm.org/D28593
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>> Sorry there is a mistaken updating the branch weight for the guard
>> >>> block. I fixed it. what i said before still hold true.
>> >>> -Xin
>> >>
>> >>
>
>


More information about the llvm-commits mailing list