<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - A zexted setcc generates a setcc + movzbl instead of xor + setcc"

   href="https://llvm.org/bugs/show_bug.cgi?id=28146">28146</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>A zexted setcc generates a setcc + movzbl instead of xor + setcc

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>mkuper@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Consider:

#include <stdio.h>

int main() {

  unsigned x = 0;

  unsigned y = 0;

#pragma nounroll

  for (unsigned i = 0; i < 1000000000; ++i) {

    y += x ^ 13;

    x += ((i + 100) >= 1000) * 3;

  }

  return y;

}

We generate:

    .text

    .globl    main

    .p2align    4, 0x90

    .type    main,@function

main:

    .cfi_startproc

    xorl    %eax, %eax

    movl    $100, %ecx

    xorl    %edi, %edi

    .p2align    4, 0x90

.LBB0_1:

    movl    %edi, %esi

    xorl    $13, %esi

    addl    %esi, %eax

    cmpl    $999, %ecx

    seta    %dl                     # <===

    movzbl    %dl, %edx               # <===

    leal    (%rdx,%rdx,2), %edx

    addl    %edx, %edi

    incl    %ecx

    cmpl    $1000000100, %ecx

    jne    .LBB0_1

    retq

.Lfunc_end0:

    .size    main, .Lfunc_end0-main

    .cfi_endproc

Instead of:

  .text

  .globl  main

  .p2align  4, 0x90

  .type main,@function

main:

  .cfi_startproc

  xorl  %eax, %eax

  movl  $100, %ecx

  xorl  %edi, %edi

  .p2align  4, 0x90

.LBB0_1:

  movl  %edi, %esi

  xorl  $13, %esi

  addl  %esi, %eax

  xorl  %edx, %edx              # <===

  cmpl  $999, %ecx

  seta  %dl                     # <===

  leal  (%rdx,%rdx,2), %edx

  addl  %edx, %edi

  incl  %ecx

  cmpl  $1000000100, %ecx

  jne .LBB0_1

  retq

.Lfunc_end0:

  .size main, .Lfunc_end0-main

  .cfi_endproc

The xor encodes smaller than the movzbl, which in itself is a good reason to

generate the former. However, there is a more surprising performance issue -

even though both versions ought to avoid partial register stalls, using the xor

idiom turns out to be much faster.

On a Haswell machine:

$ bin/clang -O2 ~/llvm/temp/setcc.s -o ~/llvm/temp/setcc.exe && time

~/llvm/temp/setcc.exe

real    0m1.045s

user    0m1.043s

sys    0m0.001s

$ bin/clang -O2 ~/llvm/temp/setcc-faster.s -o ~/llvm/temp/setcc.exe && time

~/llvm/temp/setcc.exe

real    0m0.876s

user    0m0.874s

sys    0m0.002s

Could someone at Intel confirm that this is expected? IACA doesn't show

significant stalling for the slower version, but it exists in practice (for the

slower version, about ~15% stalls, and this can be significantly increased by

making the dependency chain longer.)</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>