[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Fri Jan 13 10:38:20 PST 2017

Dehao. I am putting up a new example, the command line to reproduce is
the same. The example before suffers from block merging in
BranchFolding in downstream in llc. Because there is no
simplifications done to peeled block. With this example, we will see
that we overpeeled the loop due to not updating the metadata properly
after rotation.

Thanks
-Xin

// looprotate.c
const int looparr[4] = {1,2,3,4};
int g = 0;

__attribute__((noinline)) void call_me(int loopvalue) {
  g = loopvalue;
}

// Invoke call_me once when done_call is false.
// This is a case where the loop executes once everytime the function
// is called.

__attribute__((noinline)) void always_call(int max) {
  for (int i = 0; i < max; ++i)
    call_me(looparr[i]);
}

// Invoke call_me once when done_call is false and index is odd.
// This is a case where the loop does not execute even once half of the
// time because of the index.
__attribute__((noinline)) void call_on_odd(int max, int index) {
  for (int i = 0; i < max && index % 2; ++i)
    call_me(looparr[max]);
}

int main() {
  const int max = 1;
  for (int i = 0; i < 128; ++i) {
    always_call(max);
    call_on_odd(max, i);
  }
  return 0;
}

On Fri, Jan 13, 2017 at 9:36 AM, Xin Tong <trent.xin.tong at gmail.com> wrote:
> On Fri, Jan 13, 2017 at 9:34 AM, Dehao Chen <danielcdh at gmail.com> wrote:
>> OK, but the with the new patch, the generated assembly file is identical
>> with the assembly generated by a clean compiler (without the patch).
>>
>> Dehao
>
> Yes, there is something in branchfolding in llc that folds away some
> of the branches. I am looking at why.
>
> -Xin
>>
>> On Fri, Jan 13, 2017 at 9:21 AM, Xin Tong <trent.xin.tong at gmail.com> wrote:
>>>
>>> On Fri, Jan 13, 2017 at 7:36 AM, Dehao Chen via Phabricator
>>> <reviews at reviews.llvm.org> wrote:
>>> > danielcdh added a comment.
>>> >
>>> > Looks like this patch will make the "always call" worse:
>>> >
>>> > Without this patch:
>>> >
>>> >   pushq   %rbx
>>> >   movq    %rdi, %rbx
>>> >   cmpl    $0, (%rbx)
>>> >   jne     .LBB1_3
>>> >
>>> > .LBB1_1:                                # =>This Inner Loop Header:
>>> > Depth=1
>>> >
>>> >   movq    %rbx, %rdi
>>> >   callq   call_me
>>> >   cmpl    $0, (%rbx)
>>> >   je      .LBB1_1
>>> >
>>> > .LBB1_3:
>>> >
>>> >   popq    %rbx
>>> >   retq
>>> >
>>> > With this patch:
>>> >
>>> >   pushq   %rbx
>>> >   movq    %rdi, %rbx
>>> >   cmpl    $0, (%rbx)
>>> >   je      .LBB1_1
>>> >
>>> > .LBB1_3:
>>> >
>>> >   popq    %rbx
>>> >   retq
>>> >
>>> > .LBB1_1:                                # =>This Inner Loop Header:
>>> > Depth=1
>>> >
>>> >   movq    %rbx, %rdi
>>> >   callq   call_me
>>> >   cmpl    $0, (%rbx)
>>> >   jne     .LBB1_3
>>> >   jmp     .LBB1_1
>>> >
>>> > As the trip count of this loop is always 1, the first code will have no
>>> > taken branches, while with this patch, it will have 2 taken branches.
>>> >
>>> >
>>> > https://reviews.llvm.org/D28593
>>> >
>>> >
>>> >
>>>
>>> Sorry there is a mistaken updating the branch weight for the guard
>>> block. I fixed it. what i said before still hold true.
>>> -Xin
>>
>>