[PATCH] D34769: [X86] X86::CMOV to Branch heuristic based optimization

Tue Jul 11 10:19:55 PDT 2017

aaboud added a comment.

In https://reviews.llvm.org/D34769#797714, @spatel wrote:

> I haven't looked at the patch yet, but for reference I filed:
>  https://bugs.llvm.org//show_bug.cgi?id=33013 (although the comments veered off to a different topic)
>  ...and mentioned:
>  https://reviews.llvm.org/rL292154
>
> If the example(s) in the bug report are already here, then great. If not, you might want to consider those cases.

The optimization in this patch is triggered on inner loops only, assuming that hotspots will exist in these inner loops.
Moreover, if I modify the example on the code by calling the function from a loop like this:

  static int foo(float x) {
    if (x < 42.0f)
      return x;
    return 12;
  }

  int bar(float *a, float *b, int n) {
    int sum = 0;
  #pragma clang loop vectorize(disable)
    for (int i = 0; i < n; ++i) {
      float c = a[i] + b[i];
      sum += foo(c);
    }
    return sum;
  }

Then the patch will indeed convert the CMOV into branch.

  LBB0_4:                                 # %for.body
                                          # =>This Inner Loop Header: Depth=1
          movss   (%esi), %xmm1           # xmm1 = mem[0],zero,zero,zero
          addss   (%edx), %xmm1
          ucomiss %xmm1, %xmm0
          ja      LBB0_5
  # BB#6:                                 # %for.body
                                          #   in Loop: Header=BB0_4 Depth=1
          movl    $12, %eax
          jmp     LBB0_7
          .p2align        4, 0x90
  LBB0_5:                                 #   in Loop: Header=BB0_4 Depth=1
          cvttss2si       %xmm1, %eax
  LBB0_7:                                 # %for.body
                                          #   in Loop: Header=BB0_4 Depth=1
          addl    %edi, %eax
          addl    $4, %esi
          addl    $4, %edx
          decl    %ecx
          movl    %eax, %edi
          jne     LBB0_4

https://reviews.llvm.org/D34769