<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Incorrect shuffles produced with XOP"
   href="https://llvm.org/bugs/show_bug.cgi?id=31296">31296</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Incorrect shuffles produced with XOP
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>release blocker
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>sroland@vmware.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>With XOP enabled, the code produced by llvm for some shuffles is wrong.

This function

define void@fetch_r32_float_float(<4 x float>* %out, i8* %in) {
entry:
  %0 = getelementptr i8, i8* %in, i32 0
  %1 = bitcast i8* %0 to i32*
  %2 = load i32, i32* %1
  %3 = zext i32 %2 to i128
  %4 = bitcast i128 %3 to <4 x float>
  %5 = shufflevector <4 x float> %4, <4 x float> <float 0.000000e+00, float
1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4, i32 4, i32 5>
  store <4 x float> %5, <4 x float>* %out
  ret void
}

with -mattr=+xop produces:

fetch_r32_float_float:                  # @fetch_r32_float_float
        .cfi_startproc
# BB#0:                                 # %entry
        vpermilps       $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
        vmovaps %xmm0, (%rdi)
        retq

Which is very obviously wrong (the load got optimized away and the
corresponding element in the shuffled vector replaced with just a constant from
the second vector).

(This is basically a testcase from mesa lp_test_format, we recently enabled xop
feature or more accurately we enabled all features the cpu has support for
which uncovered this bug.)
This does not happen with avx2 or other instruction sets as far as I can tell,
it's specific to XOP.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>