<div dir="ltr"><div>Yes - the SimplifyDemandedBits enhancement:<br><a href="https://reviews.llvm.org/rL294863">https://reviews.llvm.org/rL294863</a><br><br>...exposes a hole in the x86 lowering. We didn't expect that an FP vector might be the replacement for the condition operand in a VSELECT. That's why we get the "cannot select" error.<br><br></div><div>We should be able to handle that case with a bitcast in the x86 code...<br><br>But there is also a bug in the SimplifyDemandedBits logic. Unless we don't care about signed-zero, we need to check that the operand is an integer because "SETLT" can be used with FP operands including signed-zero.<br></div><div><br><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 12, 2017 at 2:54 PM, Simon Pilgrim <span dir="ltr"><<a href="mailto:llvm-dev@redking.me.uk" target="_blank">llvm-dev@redking.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Handing off to @spatel as it appears to be due to rL294863, not
rL294856.<br>
</p><div><div class="h5">
<div class="m_4866689615663397300moz-cite-prefix">On 12/02/2017 19:35, Simon Pilgrim
wrote:<br>
</div>
<blockquote type="cite">
<p>Thanks for the test case, looking at this now. Simon.<br>
</p>
On 12/02/2017 19:10, Andrew Adams wrote:<br>
<blockquote type="cite">
<div dir="ltr">Hi Simon,Â
<div><br>
</div>
<div>
<div>A commit in between 294848 and 294862 has created a
problem when no-nans-fp-math is on. I would open a bug,
but buganizer is down. This commit looks at least related
and in the right range. Here's a repro:</div>
<div><br>
</div>
<div>test.ll:</div>
<div><br>
</div>
<div>% Computes b = select(a < 0, -1, 1) * b</div>
<div>define void @fn(<8 x float>* %a_ptr, <8 x
float>* %b_ptr) {</div>
<div>Â Â Â Â %a = load <8 x float>, <8 x float>*
%a_ptr</div>
<div>Â Â Â Â %b = load <8 x float>, <8 x float>*
%b_ptr</div>
<div>Â Â Â Â %cmp = fcmp olt <8 x float> %a,
zeroinitializer</div>
<div>Â Â Â Â %sel = select <8 x i1> %cmp, <8 x
float> <float -1.000000e+00, float -1.000000e+00,
float -1.000000e+00, float -1.000000e+00, float
-1.000000e+00, float -1.000000e+00, float -1.000000e+00,
float -1.000000e+00>, <8 x float> <float
1.000000e+00, float 1.000000e+00, float 1.000000e+00,
float 1.000000e+00, float 1.000000e+00, float
1.000000e+00, float 1.000000e+00, float 1.000000e+00></div>
<div>Â Â Â Â %c = fmul <8 x float> %sel, %b</div>
<div>Â Â Â Â store <8 x float> %c, <8 x float>*
%b_ptr</div>
<div>Â Â Â Â ret void</div>
<div>}</div>
<div><br>
</div>
<div>$ llc test.ll -mcpu=haswell -enable-no-nans-fp-math -O3</div>
<div><span class="m_4866689615663397300gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>.section<span class="m_4866689615663397300gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>__TEXT,__text,regular,pure_<wbr>instructions</div>
<div><span class="m_4866689615663397300gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>.macosx_version_min
10, 12</div>
<div>LLVM ERROR: Cannot select: t43: v8f32 = vselect t7,
t35, t32</div>
<div>Â t7: v8f32,ch = load<LD32[%a_ptr]> t0, t2,
undef:i64</div>
<div>Â Â t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0</div>
<div>Â Â Â t1: i64 = Register %vreg0</div>
<div>Â Â t6: i64 = undef</div>
<div>Â t35: v8f32 = X86ISD::VBROADCAST t34</div>
<div>Â Â t34: f32,ch = load<LD4[ConstantPool]> t0,
t37, undef:i64</div>
<div>Â Â Â t37: i64 = X86ISD::WrapperRIP
TargetConstantPool:i64<float -1.000000e+00> 0</div>
<div>Â Â Â Â t36: i64 = TargetConstantPool<float
-1.000000e+00> 0</div>
<div>Â Â Â t6: i64 = undef</div>
<div>Â t32: v8f32 = X86ISD::VBROADCAST t31</div>
<div>Â Â t31: f32,ch = load<LD4[ConstantPool]> t0,
t39, undef:i64</div>
<div>Â Â Â t39: i64 = X86ISD::WrapperRIP
TargetConstantPool:i64<float 1.000000e+00> 0</div>
<div>Â Â Â Â t38: i64 = TargetConstantPool<float
1.000000e+00> 0</div>
<div>Â Â Â t6: i64 = undef</div>
<div>In function: fn</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sat, Feb 11, 2017 at 11:27 AM,
Simon Pilgrim via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author:
rksimon<br>
Date: Sat Feb 11 11:27:21 2017<br>
New Revision: 294856<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=294856&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=294856&view=rev</a><br>
Log:<br>
[X86][SSE] Convert getTargetShuffleMaskIndices to use
getTargetConstantBitsFromNode.<br>
<br>
Removes duplicate constant extraction code in
getTargetShuffleMaskIndices.<br>
<br>
getTargetConstantBitsFromNode - adds support for
VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller
doesn't support undef bits.<br>
<br>
Modified:<br>
  llvm/trunk/lib/Target/X86/X86I<wbr>SelLowering.cpp<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86I<wbr>SelLowering.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=294856&r1=294855&r2=294856&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86ISelLowering.cpp?rev=2948<wbr>56&r1=294855&r2=294856&view=<wbr>diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86I<wbr>SelLowering.cpp
(original)<br>
+++ llvm/trunk/lib/Target/X86/X86I<wbr>SelLowering.cpp Sat
Feb 11 11:27:21 2017<br>
@@ -5151,7 +5151,8 @@ static const Constant
*getTargetConstant<br>
 // Extract raw constant bits from constant pools.<br>
 static bool getTargetConstantBitsFromNode(<wbr>SDValue
Op, unsigned EltSizeInBits,<br>
                      SmallBitVector
&UndefElts,<br>
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
SmallVectorImpl<APInt> &EltBits) {<br>
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
SmallVectorImpl<APInt> &EltBits,<br>
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â bool
AllowUndefs = true) {<br>
  assert(UndefElts.empty() && "Expected an empty
UndefElts vector");<br>
  assert(EltBits.empty() && "Expected an empty
EltBits vector");<br>
<br>
@@ -5171,6 +5172,10 @@ static bool
getTargetConstantBitsFromNod<br>
<br>
  // Split the undef/constant single bitset data into the
target elements.<br>
  auto SplitBitData = [&]() {<br>
+Â Â // Don't split if we don't allow undef bits.<br>
+Â Â if (UndefBits.getBoolValue() && !AllowUndefs)<br>
+Â Â Â return false;<br>
+<br>
   UndefElts = SmallBitVector(NumElts, false);<br>
   EltBits.resize(NumElts, APInt(EltSizeInBits, 0));<br>
<br>
@@ -5264,89 +5269,34 @@ static bool
getTargetConstantBitsFromNod<br>
   }<br>
  }<br>
<br>
+Â // Extract a rematerialized scalar constant insertion.<br>
+Â if (Op.getOpcode() == X86ISD::VZEXT_MOVL &&<br>
+Â Â Â Op.getOperand(0).getOpcode() ==
ISD::SCALAR_TO_VECTOR &&<br>
+Â Â Â isa<ConstantSDNode>(Op.getOper<wbr>and(0).getOperand(0)))
{<br>
+Â Â auto *CN = cast<ConstantSDNode>(Op.getOpe<wbr>rand(0).getOperand(0));<br>
+Â Â MaskBits = CN->getAPIntValue().zextOrTrun<wbr>c(SrcEltSizeInBits);<br>
+Â Â MaskBits = MaskBits.zext(SizeInBits);<br>
+Â Â return SplitBitData();<br>
+Â }<br>
+<br>
  return false;<br>
 }<br>
<br>
-// TODO: Merge more of this with
getTargetConstantBitsFromNode.<br>
 static bool getTargetShuffleMaskIndices(SD<wbr>Value
MaskNode,<br>
                     unsigned
MaskEltSizeInBits,<br>
                   Â
 SmallVectorImpl<uint64_t> &RawMask) {<br>
-Â MaskNode = peekThroughBitcasts(MaskNode);<br>
-<br>
-Â MVT VT = MaskNode.getSimpleValueType();<br>
-Â assert(VT.isVector() && "Can't produce a
non-vector with a build_vector!");<br>
-Â unsigned NumMaskElts = VT.getSizeInBits() /
MaskEltSizeInBits;<br>
-<br>
-Â // Split an APInt element into MaskEltSizeInBits sized
pieces and<br>
-Â // insert into the shuffle mask.<br>
-Â auto SplitElementToMask = [&](APInt Element) {<br>
-Â Â // Note that this is x86 and so always little endian:
the low byte is<br>
-Â Â // the first byte of the mask.<br>
-Â Â int Split = VT.getScalarSizeInBits() /
MaskEltSizeInBits;<br>
-Â Â for (int i = 0; i < Split; ++i) {<br>
-Â Â Â APInt RawElt = Element.getLoBits(MaskEltSizeI<wbr>nBits);<br>
-Â Â Â Element = Element.lshr(MaskEltSizeInBits<wbr>);<br>
-Â Â Â RawMask.push_back(RawElt.getZE<wbr>xtValue());<br>
-Â Â }<br>
-Â };<br>
-<br>
-Â if (MaskNode.getOpcode() == X86ISD::VBROADCAST) {<br>
-Â Â // TODO: Handle (MaskEltSizeInBits %
VT.getScalarSizeInBits()) == 0<br>
-Â Â // TODO: Handle (VT.getScalarSizeInBits() %
MaskEltSizeInBits) == 0<br>
-Â Â if (VT.getScalarSizeInBits() != MaskEltSizeInBits)<br>
-Â Â Â return false;<br>
-Â Â if (auto *CN = dyn_cast<ConstantSDNode>(MaskN<wbr>ode.getOperand(0)))
{<br>
-Â Â Â const APInt &MaskElement =
CN->getAPIntValue();<br>
-Â Â Â for (unsigned i = 0, e = VT.getVectorNumElements();
i != e; ++i) {<br>
-Â Â Â Â APInt RawElt = MaskElement.getLoBits(MaskEltS<wbr>izeInBits);<br>
-Â Â Â Â RawMask.push_back(RawElt.getZE<wbr>xtValue());<br>
-Â Â Â }<br>
-Â Â }<br>
-Â Â return false;<br>
-Â }<br>
+Â SmallBitVector UndefElts;<br>
+Â SmallVector<APInt, 64> EltBits;<br>
<br>
-Â if (MaskNode.getOpcode() == X86ISD::VZEXT_MOVL
&&<br>
-Â Â Â MaskNode.getOperand(0).getOpco<wbr>de() ==
ISD::SCALAR_TO_VECTOR) {<br>
-Â Â SDValue MaskOp = MaskNode.getOperand(0).getOper<wbr>and(0);<br>
-Â Â if (auto *CN = dyn_cast<ConstantSDNode>(MaskO<wbr>p))
{<br>
-Â Â Â if ((MaskEltSizeInBits % VT.getScalarSizeInBits())
== 0) {<br>
-Â Â Â Â RawMask.push_back(CN->getZExtV<wbr>alue());<br>
-Â Â Â Â RawMask.append(NumMaskElts - 1, 0);<br>
-Â Â Â Â return true;<br>
-Â Â Â }<br>
-<br>
-Â Â Â if ((VT.getScalarSizeInBits() % MaskEltSizeInBits)
== 0) {<br>
-Â Â Â Â unsigned ElementSplit = VT.getScalarSizeInBits()
/ MaskEltSizeInBits;<br>
-Â Â Â Â SplitElementToMask(CN->getAPIn<wbr>tValue());<br>
-Â Â Â Â RawMask.append((VT.getVectorNu<wbr>mElements() -
1) * ElementSplit, 0);<br>
-Â Â Â Â return true;<br>
-Â Â Â }<br>
-Â Â }<br>
-Â Â return false;<br>
-Â }<br>
-<br>
-Â if (MaskNode.getOpcode() != ISD::BUILD_VECTOR)<br>
+Â // Extract the raw target constant bits.<br>
+Â // FIXME: We currently don't support UNDEF bits or mask
entries.<br>
+Â if (!getTargetConstantBitsFromNod<wbr>e(MaskNode,
MaskEltSizeInBits, UndefElts,<br>
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â EltBits, /*
AllowUndefs */ false))<br>
   return false;<br>
<br>
-Â // We can always decode if the buildvector is all zero
constants,<br>
-Â // but can't use isBuildVectorAllZeros as it might
contain UNDEFs.<br>
-Â if (all_of(MaskNode->ops(), X86::isZeroNode)) {<br>
-Â Â RawMask.append(NumMaskElts, 0);<br>
-Â Â return true;<br>
-Â }<br>
-<br>
-Â // TODO: Handle (MaskEltSizeInBits %
VT.getScalarSizeInBits()) == 0<br>
-Â if ((VT.getScalarSizeInBits() % MaskEltSizeInBits) !=
0)<br>
-Â Â return false;<br>
-<br>
-Â for (SDValue Op : MaskNode->ops()) {<br>
-Â Â if (auto *CN = dyn_cast<ConstantSDNode>(Op.ge<wbr>tNode()))<br>
-Â Â Â SplitElementToMask(CN->getAPIn<wbr>tValue());<br>
-Â Â else if (auto *CFN =
dyn_cast<ConstantFPSDNode>(Op.<wbr>getNode()))<br>
-Â Â Â SplitElementToMask(CFN->getVal<wbr>ueAPF().bitcastToAPInt());<br>
-Â Â else<br>
-Â Â Â return false;<br>
-Â }<br>
+Â // Insert the extracted elements into the mask.<br>
+Â for (APInt Elt : EltBits)<br>
+Â Â RawMask.push_back(Elt.getZExtV<wbr>alue());<br>
<br>
  return true;<br>
 }<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>