<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [x86] wrong codegen during vselect/pshufb optimisation involving zero literals"

   href="https://bugs.llvm.org/show_bug.cgi?id=52122">52122</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[x86] wrong codegen during vselect/pshufb optimisation involving zero literals

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>benjsith@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>LLVM incorrectly handles an optimisation around changing a vselect/pshufb combo

that has constant zeros in one of the inputs.

Here is a minimal repro that triggers the issue:

__m256i do_stuff(__m256i I1, __m256i I2) {

    __m256i X = _mm256_set1_epi64x(0);

    __m256i A = _mm256_unpacklo_epi8(I1, X);

    __m256i B = _mm256_unpacklo_epi8(A, I2);

    __m256i C = _mm256_unpackhi_epi8(B, A);

    return C;

}

compiled with -mavx2 and -O1 or -O2.

Here is an example snippet of code on Godbolt, showing the difference in

output:

<a href="https://godbolt.org/z/r5acr545K">https://godbolt.org/z/r5acr545K</a>

Places that should have had 00 as the output byte would instead output one of

the input bytes.

The problematic parts of assembly are:

.LCPI0_1:

        .byte   2                               # 0x2

        .byte   4                               # 0x4

        .byte   128                             # 0x80

        .zero   1

        .zero   1

...

and then:

vpshufb ymm0, ymm0, ymmword ptr [rip + .LCPI0_1] # ymm0 =

ymm0[2,4],zero,ymm0[u,u,5],zero,ymm0[u,3,6],zero,ymm0[u,u,7],zero,ymm0[u,18,20],zero,ymm0[u,u,21],zero,ymm0[u,19,22],zero,ymm0[u,u,23],zero,ymm0[u]

The mask that gets generated for that pshufb should contain 0x80's to put a

constant zero in the destination instead of one of the input bytes. Instead,

the mask has null bytes, which loads the first byte of the source operand.

The following pass (in X86ISelLowering.cpp) seems to be the cause:

// fold vselect(cond, pshufb(x), pshufb(y)) -> or (pshufb(x), pshufb(y))

It calls getTargetShuffleMask(), and indirectly DecodePSHUFBMask(), to pull out

the mask of one of the operands. This function uses SM_SentinelZero (-2) to

mark indices that should load a constant zero. However, this mask then gets

passed to getConstVector(). getConstVector() interprets any negative numbers as

undef, and so these indices get marked as undef instead of a constant zero

(0x80). These undef indices end up being compiled as null bytes in the final

mask, which leads to the observed behaviour.

I'm not sure what a correct patch here would be. I found that converting from

SM_SentinelZero (-2) to 0x80 at this stage, before getConstVector(), fixed the

issue. But there's probably another way.

I verified the bug still repros on the latest trunk,

a1f0f847ff7d3944c992158226026024ccc67207.

PS: I don't know if this changes the priority, but this bug was found via a

fuzzer I wrote that's meant to test intrinsics compilation. It was not in code

I manually wrote</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>