<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - wrong codegen due to _mm_mpsadbw_epu8 intrinsic incorrectly marked as commutative"
href="https://bugs.llvm.org/show_bug.cgi?id=51908">51908</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>wrong codegen due to _mm_mpsadbw_epu8 intrinsic incorrectly marked as commutative
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>benjsith@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>I came across a case where using the Intel SSE4.1 intrinsic _mm_mpsadbw_epu8
appears to lead to a mis-compilation when optimization (O1) is turned on.
I tried to come up with a minimal repro, as follows:
__m128i do_stuff(const __m128i* iVals) {
const __m128i I0 = _mm_load_si128(&iVals[0]);
const __m128i I1 = _mm_load_si128(&iVals[1]);
const __m128i I2 = _mm_load_si128(&iVals[2]);
const __m128i A = _mm_mpsadbw_epu8(I0, I2, 0);
const __m128i B = _mm_add_epi8(I2, I1);
const __m128i C = _mm_add_epi8(B, A);
return C;
}
This function will run fine when compiled with -O0, but when using -O1 it gives
incorrect results. The -O1 assembly output is as follows:
do_stuff(long long __vector(2) const*):
vmovdqa xmm0, xmmword ptr [rdi + 32]
vmpsadbw xmm1, xmm0, xmmword ptr [rdi], 0
vpaddb xmm0, xmm0, xmmword ptr [rdi + 16]
vpaddb xmm0, xmm0, xmm1
ret
This is mostly correct, however the vmpsadbw opcode has had its operand order
flipped. It would be equivalent to having called
_mm_mpsadbw_epu8(I2, I0, 0)
instead. I believe this is because in the LLVM code, this op code is marked as
commutative. In llvm/include/llvm/IR/IntrinsicsX86.td, lines 791-796:
// Vector sum of absolute differences
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse41_mpsadbw : GCCBuiltin<"__builtin_ia32_mpsadbw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v16i8_ty, llvm_v16i8_ty,llvm_i8_ty],
[IntrNoMem, Commutative, ImmArg<ArgIndex<2>>]>;
}
This opcode is not commutative however. The byte-wise differences are
calculated using different indices for the first and second argument, meaning
that swapping the argument order leads to different results.
I first noticed this on Clang 12.0 for Windows, however I tested it on Godbolt
using the trunk Clang compiler and it still repros there. Here is a link to the
Godbolt code: <a href="https://godbolt.org/z/zs576oh39">https://godbolt.org/z/zs576oh39</a>
Cheers, and let me know if you need anything else (or if I've gotten something
wrong: this is my first bug filed)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>