<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - _mm256_xor_si256 prefers vxorps over vpxor on integer vectors when out-of-context"

   href="https://bugs.llvm.org/show_bug.cgi?id=36127">36127</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>_mm256_xor_si256 prefers vxorps over vpxor on integer vectors when out-of-context

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>C++

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>gonzalobg88@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>dgregor@apple.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This code (see it live: <a href="https://godbolt.org/g/jJHBMi">https://godbolt.org/g/jJHBMi</a>):

#include <immintrin.h>

__attribute__((__always_inline__, __nodebug__, __target__("avx2")))

__m256i foo(__m256i a, __m256i b) {

    auto c = _mm256_add_epi64(a, b);

    return _mm256_xor_si256(a, c);

}

__attribute__((__always_inline__, __nodebug__, __target__("avx2")))

__m256i bar(__m256i a, __m256i b) {

    return _mm256_xor_si256(a, b);

}

generates this assembly: 

foo(long long __vector(4), long long __vector(4)):                          #

@foo(long long __vector(4), long long __vector(4))

        push    rbp

        mov     rbp, rsp

        and     rsp, -32

        sub     rsp, 32

        vmovdqa ymm0, ymmword ptr [rbp + 16]

        vpaddq  ymm1, ymm0, ymmword ptr [rbp + 48]

        vpxor   ymm0, ymm1, ymm0

        mov     rsp, rbp

        pop     rbp

        ret

bar(long long __vector(4), long long __vector(4)):                          #

@bar(long long __vector(4), long long __vector(4))

        push    rbp

        mov     rbp, rsp

        and     rsp, -32

        sub     rsp, 32

        vmovaps ymm0, ymmword ptr [rbp + 48]

        vxorps  ymm0, ymm0, ymmword ptr [rbp + 16]

        mov     rsp, rbp

        pop     rbp

        ret

So it looks to me that LLVM/clang are choosing `vxorps` by default, and only

`vpxor` if the vectors are being already operated on in the integer domain and

switching domains would increase latency (foo).

However, even though `vxorps` has a smaller encoding, we are operating on

integer vectors here in both cases (foo and bar), and given that `vxorps` and

`vpxor` have the same latency, but that `vpxor` has Nx times the throughput of

`vxorps` (where N is in range [3, 5]), LLVM should be emitting `vpxor` here for

bar as well unless we are optimizing for code-size which the example above

doesn't do. 

Of course, when already operating in the floating point domain, LLVM should

continue to emit `vxorps` to avoid increasing latency by switching domains.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>