[PATCH] D17181: [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
Ahmed Bougacha via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 11 18:08:12 PST 2016
ab created this revision.
ab added subscribers: RKSimon, spatel, craig.topper, qcolombet, llvm-commits, mzolotukhin.
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect.
Consider test/CodeGen/X86/vec-sign.ll:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead of directly using c as a mask, avoid the zero case by setting
any bit (other than the sign bit). This lets PSIGN default to the
positive case, while not changing the "negative" (all ones) case.
With that, the generated sequence correctly implements:
(c ? -v : v)
Fixes PR26110.
http://reviews.llvm.org/D17181
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/avx2-logic.ll
test/CodeGen/X86/vec-sign.ll
Index: test/CodeGen/X86/vec-sign.ll
===================================================================
--- test/CodeGen/X86/vec-sign.ll
+++ test/CodeGen/X86/vec-sign.ll
@@ -4,6 +4,7 @@
define <4 x i32> @signd(<4 x i32> %a, <4 x i32> %b) nounwind {
; CHECK-LABEL: signd:
; CHECK: # BB#0: # %entry
+; CHECK-NEXT: por {{.*}}(%rip), %xmm1
; CHECK-NEXT: psignd %xmm1, %xmm0
; CHECK-NEXT: retq
entry:
Index: test/CodeGen/X86/avx2-logic.ll
===================================================================
--- test/CodeGen/X86/avx2-logic.ll
+++ test/CodeGen/X86/avx2-logic.ll
@@ -72,6 +72,8 @@
define <8 x i32> @signd(<8 x i32> %a, <8 x i32> %b) nounwind {
; CHECK-LABEL: signd:
; CHECK: ## BB#0: ## %entry
+; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %ymm2
+; CHECK-NEXT: vpor %ymm2, %ymm1, %ymm1
; CHECK-NEXT: vpsignd %ymm1, %ymm0, %ymm0
; CHECK-NEXT: retq
entry:
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -26332,16 +26332,35 @@
SDLoc DL(N);
// Now we know we at least have a plendvb with the mask val. See if
- // we can form a psignb/w/d.
- // psign = x.type == y.type == mask.type && y = sub(0, x);
+ // we can form a psignb/w/d:
+ // (src < 0 ? -dst : src > 0 ? dst : 0)
+ //
+ // Here, src is Mask, and that's either:
+ // - all ones: negative, so result in -dst (sub 0, x)
+ // - all zeroes: PSIGN would result in 0, which is unwanted. Instead,
+ // OR the Mask with 1 to make it..:
+ // - non-zero, positive: PSIGN results in dst (x).
+ //
+ // In other words, we can combine:
+ // or (and (m, x), (pandn m, (sub 0, x)))
+ // into:
+ // psign x, (or m, 1)
if (Y.getOpcode() == ISD::SUB && Y.getOperand(1) == X &&
ISD::isBuildVectorAllZeros(Y.getOperand(0).getNode()) &&
X.getValueType() == MaskVT && Y.getValueType() == MaskVT) {
assert((EltBits == 8 || EltBits == 16 || EltBits == 32) &&
"Unsupported VT for PSIGN");
- Mask = DAG.getNode(X86ISD::PSIGN, DL, MaskVT, X, Mask.getOperand(0));
+
+ SDValue One = DAG.getConstant(1, DL, MaskVT.getVectorElementType());
+ SDValue Ones = DAG.getNode(
+ ISD::BUILD_VECTOR, DL, MaskVT,
+ SmallVector<SDValue, 8>(MaskVT.getVectorNumElements(), One));
+
+ Mask = DAG.getNode(ISD::OR, DL, MaskVT, Mask.getOperand(0), Ones);
+ Mask = DAG.getNode(X86ISD::PSIGN, DL, MaskVT, X, Mask);
return DAG.getBitcast(VT, Mask);
}
+
// PBLENDVB only available on SSE 4.1
if (!Subtarget.hasSSE41())
return SDValue();
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D17181.47760.patch
Type: text/x-patch
Size: 2785 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160212/1f71ddc2/attachment.bin>
More information about the llvm-commits
mailing list