[llvm] r229835 - [x86, sdag] Two interrelated changes to the x86 and sdag code.
Sergey Matveev
earthdok at google.com
Tue Mar 3 12:10:06 PST 2015
This is causing a clang crash in Chromium:
http://llvm.org/bugs/show_bug.cgi?id=22773
We really can't delay the clang roll any longer, and would much rather
revert this patch until it's fixed. However, I'm having trouble reverting
it cleanly - check-llvm appears to hang forever. Can someone more familiar
with the code please take a look?
On Thu, Feb 19, 2015 at 1:36 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> Author: chandlerc
> Date: Thu Feb 19 04:36:19 2015
> New Revision: 229835
>
> URL: http://llvm.org/viewvc/llvm-project?rev=229835&view=rev
> Log:
> [x86,sdag] Two interrelated changes to the x86 and sdag code.
>
> First, don't combine bit masking into vector shuffles (even ones the
> target can handle) once operation legalization has taken place. Custom
> legalization of vector shuffles may exist for these patterns (making the
> predicate return true) but that custom legalization may in some cases
> produce the exact bit math this matches. We only really want to handle
> this prior to operation legalization.
>
> However, the x86 backend, in a fit of awesome, relied on this. What it
> would do is mark VSELECTs as expand, which would turn them into
> arithmetic, which this would then match back into vector shuffles, which
> we would then lower properly. Amazing.
>
> Instead, the second change is to teach the x86 backend to directly form
> vector shuffles from VSELECT nodes with constant conditions, and to mark
> all of the vector types we support lowering blends as shuffles as custom
> VSELECT lowering. We still mark the forms which actually support
> variable blends as *legal* so that the custom lowering is bypassed, and
> the legal lowering can even be used by the vector shuffle legalization
> (yes, i know, this is confusing. but that's how the patterns are
> written).
>
> This makes the VSELECT lowering much more sensible, and in fact should
> fix a bunch of bugs with it. However, as you'll see in the test cases,
> right now what it does is point out the *hilarious* deficiency of the
> new vector shuffle lowering when it comes to blends. Fortunately, my
> very next patch fixes that. I can't submit it yet, because that patch,
> somewhat obviously, forms the exact and/or pattern that the DAG combine
> is matching here! Without this patch, teaching the vector shuffle
> lowering to produce the right code infloops in the DAG combiner. With
> this patch alone, we produce terrible code but at least lower through
> the right paths. With both patches, all the regressions here should be
> fixed, and a bunch of the improvements (like using 2 shufps with no
> memory loads instead of 2 andps with memory loads and an orps) will
> stay. Win!
>
> There is one other change worth noting here. We had hilariously wrong
> vectorization cost estimates for vselect because we fell through to the
> code path that assumed all "expand" vector operations are scalarized.
> However, the "expand" lowering of VSELECT is vector bit math, most
> definitely not scalarized. So now we go back to the correct if horribly
> naive cost of "1" for "not scalarized". If anyone wants to add actual
> modeling of shuffle costs, that would be cool, but this seems an
> improvement on its own. Note the removal of 16 and 32 "costs" for doing
> a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
> course, we don't right now because of OMG bad code, but I'm going to fix
> that. Next patch. I promise.
>
> Modified:
> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll
> llvm/trunk/test/CodeGen/X86/vector-blend.ll
> llvm/trunk/test/CodeGen/X86/vselect.ll
>
> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
>
> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Thu Feb 19
> 04:36:19 2015
> @@ -11973,9 +11973,11 @@ SDValue DAGCombiner::XformToShuffleWithZ
> return SDValue();
> }
>
> - // Let's see if the target supports this vector_shuffle.
> + // Let's see if the target supports this vector_shuffle and make
> sure
> + // we're not running after operation legalization where it may have
> + // custom lowered the vector shuffles.
> EVT RVT = RHS.getValueType();
> - if (!TLI.isVectorClearMaskLegal(Indices, RVT))
> + if (LegalOperations || !TLI.isVectorClearMaskLegal(Indices, RVT))
> return SDValue();
>
> // Return the new VECTOR_SHUFFLE node.
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Feb 19 04:36:19 2015
> @@ -926,6 +926,7 @@ X86TargetLowering::X86TargetLowering(con
> setOperationAction(ISD::LOAD, MVT::v4f32, Legal);
> setOperationAction(ISD::BUILD_VECTOR, MVT::v4f32, Custom);
> setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);
> + setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);
> setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);
> setOperationAction(ISD::SELECT, MVT::v4f32, Custom);
> setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Custom);
> @@ -994,6 +995,7 @@ X86TargetLowering::X86TargetLowering(con
> continue;
> setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
> setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
> + setOperationAction(ISD::VSELECT, VT, Custom);
> setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
> }
>
> @@ -1017,6 +1019,8 @@ X86TargetLowering::X86TargetLowering(con
> setOperationAction(ISD::BUILD_VECTOR, MVT::v2i64, Custom);
> setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f64, Custom);
> setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i64, Custom);
> + setOperationAction(ISD::VSELECT, MVT::v2f64, Custom);
> + setOperationAction(ISD::VSELECT, MVT::v2i64, Custom);
> setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v2f64, Custom);
> setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Custom);
>
> @@ -1098,13 +1102,8 @@ X86TargetLowering::X86TargetLowering(con
> // FIXME: Do we need to handle scalar-to-vector here?
> setOperationAction(ISD::MUL, MVT::v4i32, Legal);
>
> - setOperationAction(ISD::VSELECT, MVT::v2f64, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v2i64, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v4i32, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v8i16, Custom);
> - // There is no BLENDI for byte vectors. We don't need to custom lower
> - // some vselects for now.
> + // We directly match byte blends in the backend as they match the
> VSELECT
> + // condition form.
> setOperationAction(ISD::VSELECT, MVT::v16i8, Legal);
>
> // SSE41 brings specific instructions for doing vector sign extend
> even in
> @@ -1245,11 +1244,6 @@ X86TargetLowering::X86TargetLowering(con
> setOperationAction(ISD::SELECT, MVT::v4i64, Custom);
> setOperationAction(ISD::SELECT, MVT::v8f32, Custom);
>
> - setOperationAction(ISD::VSELECT, MVT::v4f64, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v4i64, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v8i32, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v8f32, Custom);
> -
> setOperationAction(ISD::SIGN_EXTEND, MVT::v4i64, Custom);
> setOperationAction(ISD::SIGN_EXTEND, MVT::v8i32, Custom);
> setOperationAction(ISD::SIGN_EXTEND, MVT::v16i16, Custom);
> @@ -1293,9 +1287,6 @@ X86TargetLowering::X86TargetLowering(con
> setOperationAction(ISD::MULHU, MVT::v16i16, Legal);
> setOperationAction(ISD::MULHS, MVT::v16i16, Legal);
>
> - setOperationAction(ISD::VSELECT, MVT::v16i16, Custom);
> - setOperationAction(ISD::VSELECT, MVT::v32i8, Legal);
> -
> // The custom lowering for UINT_TO_FP for v8i32 becomes interesting
> // when we have a 256bit-wide blend with immediate.
> setOperationAction(ISD::UINT_TO_FP, MVT::v8i32, Custom);
> @@ -1368,6 +1359,7 @@ X86TargetLowering::X86TargetLowering(con
>
> setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
> setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
> + setOperationAction(ISD::VSELECT, VT, Custom);
> setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);
> setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
> setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Custom);
> @@ -1375,6 +1367,10 @@ X86TargetLowering::X86TargetLowering(con
> setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
> }
>
> + if (Subtarget->hasInt256())
> + setOperationAction(ISD::VSELECT, MVT::v32i8, Legal);
> +
> +
> // Promote v32i8, v16i16, v8i32 select, and, or, xor to v4i64.
> for (int i = MVT::v32i8; i != MVT::v4i64; ++i) {
> MVT VT = (MVT::SimpleValueType)i;
> @@ -13139,48 +13135,29 @@ static bool BUILD_VECTORtoBlendMask(Buil
> return true;
> }
>
> -/// \brief Try to lower a VSELECT instruction to an immediate-controlled
> blend
> -/// instruction.
> -static SDValue lowerVSELECTtoBLENDI(SDValue Op, const X86Subtarget
> *Subtarget,
> - SelectionDAG &DAG) {
> +/// \brief Try to lower a VSELECT instruction to a vector shuffle.
> +static SDValue lowerVSELECTtoVectorShuffle(SDValue Op,
> + const X86Subtarget *Subtarget,
> + SelectionDAG &DAG) {
> SDValue Cond = Op.getOperand(0);
> SDValue LHS = Op.getOperand(1);
> SDValue RHS = Op.getOperand(2);
> SDLoc dl(Op);
> MVT VT = Op.getSimpleValueType();
> - MVT EltVT = VT.getVectorElementType();
> - unsigned NumElems = VT.getVectorNumElements();
> -
> - // There is no blend with immediate in AVX-512.
> - if (VT.is512BitVector())
> - return SDValue();
> -
> - if (!Subtarget->hasSSE41() || EltVT == MVT::i8)
> - return SDValue();
> - if (!Subtarget->hasInt256() && VT == MVT::v16i16)
> - return SDValue();
>
> if (!ISD::isBuildVectorOfConstantSDNodes(Cond.getNode()))
> return SDValue();
> + auto *CondBV = cast<BuildVectorSDNode>(Cond);
>
> - // Check the mask for BLEND and build the value.
> - unsigned MaskValue = 0;
> - if (!BUILD_VECTORtoBlendMask(cast<BuildVectorSDNode>(Cond), MaskValue))
> - return SDValue();
> -
> - // Convert i32 vectors to floating point if it is not AVX2.
> - // AVX2 introduced VPBLENDD instruction for 128 and 256-bit vectors.
> - MVT BlendVT = VT;
> - if (EltVT == MVT::i64 || (EltVT == MVT::i32 &&
> !Subtarget->hasInt256())) {
> - BlendVT =
> MVT::getVectorVT(MVT::getFloatingPointVT(EltVT.getSizeInBits()),
> - NumElems);
> - LHS = DAG.getNode(ISD::BITCAST, dl, VT, LHS);
> - RHS = DAG.getNode(ISD::BITCAST, dl, VT, RHS);
> + // Only non-legal VSELECTs reach this lowering, convert those into
> generic
> + // shuffles and re-use the shuffle lowering path for blends.
> + SmallVector<int, 32> Mask;
> + for (int i = 0, Size = VT.getVectorNumElements(); i < Size; ++i) {
> + SDValue CondElt = CondBV->getOperand(i);
> + Mask.push_back(
> + isa<ConstantSDNode>(CondElt) ? i + (isZero(CondElt) ? Size : 0) :
> -1);
> }
> -
> - SDValue Ret = DAG.getNode(X86ISD::BLENDI, dl, BlendVT, LHS, RHS,
> - DAG.getConstant(MaskValue, MVT::i32));
> - return DAG.getNode(ISD::BITCAST, dl, VT, Ret);
> + return DAG.getVectorShuffle(VT, dl, LHS, RHS, Mask);
> }
>
> SDValue X86TargetLowering::LowerVSELECT(SDValue Op, SelectionDAG &DAG)
> const {
> @@ -13191,10 +13168,16 @@ SDValue X86TargetLowering::LowerVSELECT(
> ISD::isBuildVectorOfConstantSDNodes(Op.getOperand(2).getNode()))
> return SDValue();
>
> - SDValue BlendOp = lowerVSELECTtoBLENDI(Op, Subtarget, DAG);
> + // Try to lower this to a blend-style vector shuffle. This can handle
> all
> + // constant condition cases.
> + SDValue BlendOp = lowerVSELECTtoVectorShuffle(Op, Subtarget, DAG);
> if (BlendOp.getNode())
> return BlendOp;
>
> + // Variable blends are only legal from SSE4.1 onward.
> + if (!Subtarget->hasSSE41())
> + return SDValue();
> +
> // Some types for vselect were previously set to Expand, not Legal or
> // Custom. Return an empty SDValue so we fall-through to Expand, after
> // the Custom lowering phase.
>
> Modified: llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll (original)
> +++ llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll Thu Feb 19
> 04:36:19 2015
> @@ -11,7 +11,7 @@
>
> define <2 x i64> @test_2i64(<2 x i64> %a, <2 x i64> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_2i64':
> -; SSE2: Cost Model: {{.*}} 4 for instruction: %sel = select <2 x i1>
> +; SSE2: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; SSE41: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> @@ -21,7 +21,7 @@ define <2 x i64> @test_2i64(<2 x i64> %a
>
> define <2 x double> @test_2double(<2 x double> %a, <2 x double> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function
> 'test_2double':
> -; SSE2: Cost Model: {{.*}} 3 for instruction: %sel = select <2 x i1>
> +; SSE2: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; SSE41: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <2 x i1>
> @@ -31,7 +31,7 @@ define <2 x double> @test_2double(<2 x d
>
> define <4 x i32> @test_4i32(<4 x i32> %a, <4 x i32> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_4i32':
> -; SSE2: Cost Model: {{.*}} 8 for instruction: %sel = select <4 x i1>
> +; SSE2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; SSE41: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> @@ -41,7 +41,7 @@ define <4 x i32> @test_4i32(<4 x i32> %a
>
> define <4 x float> @test_4float(<4 x float> %a, <4 x float> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function
> 'test_4float':
> -; SSE2: Cost Model: {{.*}} 7 for instruction: %sel = select <4 x i1>
> +; SSE2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; SSE41: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> @@ -51,7 +51,7 @@ define <4 x float> @test_4float(<4 x flo
>
> define <16 x i8> @test_16i8(<16 x i8> %a, <16 x i8> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_16i8':
> -; SSE2: Cost Model: {{.*}} 32 for instruction: %sel = select <16 x i1>
> +; SSE2: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> ; SSE41: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> @@ -63,7 +63,7 @@ define <16 x i8> @test_16i8(<16 x i8> %a
> ; <8 x float>. Integers of the same size should also use those
> instructions.
> define <4 x i64> @test_4i64(<4 x i64> %a, <4 x i64> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_4i64':
> -; SSE2: Cost Model: {{.*}} 8 for instruction: %sel = select <4 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <4 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <4 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> @@ -73,7 +73,7 @@ define <4 x i64> @test_4i64(<4 x i64> %a
>
> define <4 x double> @test_4double(<4 x double> %a, <4 x double> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function
> 'test_4double':
> -; SSE2: Cost Model: {{.*}} 6 for instruction: %sel = select <4 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <4 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <4 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <4 x i1>
> @@ -83,7 +83,7 @@ define <4 x double> @test_4double(<4 x d
>
> define <8 x i32> @test_8i32(<8 x i32> %a, <8 x i32> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_8i32':
> -; SSE2: Cost Model: {{.*}} 16 for instruction: %sel = select <8 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <8 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <8 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <8 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <8 x i1>
> @@ -93,7 +93,7 @@ define <8 x i32> @test_8i32(<8 x i32> %a
>
> define <8 x float> @test_8float(<8 x float> %a, <8 x float> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function
> 'test_8float':
> -; SSE2: Cost Model: {{.*}} 14 for instruction: %sel = select <8 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <8 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <8 x i1>
> ; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <8 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <8 x i1>
> @@ -104,10 +104,9 @@ define <8 x float> @test_8float(<8 x flo
> ; AVX2
> define <16 x i16> @test_16i16(<16 x i16> %a, <16 x i16> %b) {
> ; CHECK:Printing analysis 'Cost Model Analysis' for function 'test_16i16':
> -; SSE2: Cost Model: {{.*}} 32 for instruction: %sel = select <16 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <16 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <16 x i1>
> -;;; FIXME: This AVX cost is obviously wrong. We shouldn't be scalarizing.
> -; AVX: Cost Model: {{.*}} 32 for instruction: %sel = select <16 x i1>
> +; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <16 x i1>
> %sel = select <16 x i1> <i1 true, i1 false, i1 false, i1 false, i1
> true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false,
> i1 true, i1 false, i1 false, i1 false>, <16 x i16> %a, <16 x i16> %b
> ret <16 x i16> %sel
> @@ -115,10 +114,9 @@ define <16 x i16> @test_16i16(<16 x i16>
>
> define <32 x i8> @test_32i8(<32 x i8> %a, <32 x i8> %b) {
> ; CHECK: Printing analysis 'Cost Model Analysis' for function 'test_32i8':
> -; SSE2: Cost Model: {{.*}} 64 for instruction: %sel = select <32 x i1>
> +; SSE2: Cost Model: {{.*}} 2 for instruction: %sel = select <32 x i1>
> ; SSE41: Cost Model: {{.*}} 2 for instruction: %sel = select <32 x i1>
> -;;; FIXME: This AVX cost is obviously wrong. We shouldn't be scalarizing.
> -; AVX: Cost Model: {{.*}} 64 for instruction: %sel = select <32 x i1>
> +; AVX: Cost Model: {{.*}} 1 for instruction: %sel = select <32 x i1>
> ; AVX2: Cost Model: {{.*}} 1 for instruction: %sel = select <32 x i1>
> %sel = select <32 x i1> <i1 true, i1 false, i1 true, i1 true, i1 true,
> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
> i1 false, i1 true, i1 true>, <32 x i8> %a, <32 x i8> %b
> ret <32 x i8> %sel
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-blend.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-blend.ll?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-blend.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-blend.ll Thu Feb 19 04:36:19 2015
> @@ -9,16 +9,14 @@
> define <4 x float> @vsel_float(<4 x float> %v1, <4 x float> %v2) {
> ; SSE2-LABEL: vsel_float:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
> +; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_float:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_float:
> @@ -65,16 +63,14 @@ entry:
> define <4 x i8> @vsel_4xi8(<4 x i8> %v1, <4 x i8> %v2) {
> ; SSE2-LABEL: vsel_4xi8:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[3,0]
> +; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_4xi8:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[3,0]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_4xi8:
> @@ -99,16 +95,16 @@ entry:
> define <4 x i16> @vsel_4xi16(<4 x i16> %v1, <4 x i16> %v2) {
> ; SSE2-LABEL: vsel_4xi16:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,0],xmm0[0,0]
> +; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[2,3]
> +; SSE2-NEXT: movaps %xmm1, %xmm0
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_4xi16:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,0],xmm0[0,0]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[2,3]
> +; SSSE3-NEXT: movaps %xmm1, %xmm0
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_4xi16:
> @@ -133,16 +129,16 @@ entry:
> define <4 x i32> @vsel_i32(<4 x i32> %v1, <4 x i32> %v2) {
> ; SSE2-LABEL: vsel_i32:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,3,2,3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_i32:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,3,2,3]
> +; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_i32:
> @@ -226,16 +222,30 @@ entry:
> define <8 x i16> @vsel_8xi16(<8 x i16> %v1, <8 x i16> %v2) {
> ; SSE2-LABEL: vsel_8xi16:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_8xi16:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
> +; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSSE3-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
> +; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
> +; SSSE3-NEXT: pshufb {{.*#+}} xmm0 =
> xmm0[0,1,10,11,4,5,2,3,4,5,10,11,4,5,6,7]
> +; SSSE3-NEXT: pshufb {{.*#+}} xmm1 =
> xmm1[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
> +; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_8xi16:
> @@ -255,16 +265,42 @@ entry:
> define <16 x i8> @vsel_i8(<16 x i8> %v1, <16 x i8> %v2) {
> ; SSE2-LABEL: vsel_i8:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSE2-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSE2-NEXT: orps %xmm1, %xmm0
> +; SSE2-NEXT: pxor %xmm2, %xmm2
> +; SSE2-NEXT: movdqa %xmm1, %xmm3
> +; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 =
> xmm3[8],xmm2[8],xmm3[9],xmm2[9],xmm3[10],xmm2[10],xmm3[11],xmm2[11],xmm3[12],xmm2[12],xmm3[13],xmm2[13],xmm3[14],xmm2[14],xmm3[15],xmm2[15]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm3 = xmm3[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm3 = xmm3[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: movdqa %xmm1, %xmm4
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm4 =
> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3],xmm4[4],xmm2[4],xmm4[5],xmm2[5],xmm4[6],xmm2[6],xmm4[7],xmm2[7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm4[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
> +; SSE2-NEXT: packuswb %xmm0, %xmm2
> +; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: packuswb %xmm0, %xmm1
> +; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
> +; SSE2-NEXT: packuswb %xmm0, %xmm0
> +; SSE2-NEXT: packuswb %xmm0, %xmm0
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: vsel_i8:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm1
> -; SSSE3-NEXT: andps {{.*}}(%rip), %xmm0
> -; SSSE3-NEXT: orps %xmm1, %xmm0
> +; SSSE3-NEXT: movdqa %xmm1, %xmm2
> +; SSSE3-NEXT: pshufb {{.*#+}} xmm2 =
> xmm2[2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u]
> +; SSSE3-NEXT: pshufb {{.*#+}} xmm0 =
> xmm0[0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u]
> +; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
> +; SSSE3-NEXT: pshufb {{.*#+}} xmm1 =
> xmm1[1,3,5,7,9,11,13,15,u,u,u,u,u,u,u,u]
> +; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: vsel_i8:
> @@ -419,8 +455,8 @@ define <8 x i64> @vsel_i648(<8 x i64> %v
> ;
> ; SSE41-LABEL: vsel_i648:
> ; SSE41: # BB#0: # %entry
> -; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm0[0],xmm4[1]
> -; SSE41-NEXT: blendpd {{.*#+}} xmm2 = xmm2[0],xmm6[1]
> +; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm4[4,5,6,7]
> +; SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1,2,3],xmm6[4,5,6,7]
> ; SSE41-NEXT: movaps %xmm5, %xmm1
> ; SSE41-NEXT: movaps %xmm7, %xmm3
> ; SSE41-NEXT: retq
> @@ -586,26 +622,22 @@ entry:
> define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x float>
> %abcd) {
> ; SSE2-LABEL: constant_blendvps_avx:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: movaps {{.*#+}} xmm4 =
> [4294967295,4294967295,4294967295,0]
> -; SSE2-NEXT: andps %xmm4, %xmm2
> -; SSE2-NEXT: movaps {{.*#+}} xmm5 = [0,0,0,4294967295]
> -; SSE2-NEXT: andps %xmm5, %xmm0
> -; SSE2-NEXT: orps %xmm2, %xmm0
> -; SSE2-NEXT: andps %xmm4, %xmm3
> -; SSE2-NEXT: andps %xmm5, %xmm1
> -; SSE2-NEXT: orps %xmm3, %xmm1
> +; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,0],xmm2[2,0]
> +; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1],xmm0[2,0]
> +; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,0],xmm3[2,0]
> +; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,1],xmm1[2,0]
> +; SSE2-NEXT: movaps %xmm2, %xmm0
> +; SSE2-NEXT: movaps %xmm3, %xmm1
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: constant_blendvps_avx:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: movaps {{.*#+}} xmm4 =
> [4294967295,4294967295,4294967295,0]
> -; SSSE3-NEXT: andps %xmm4, %xmm2
> -; SSSE3-NEXT: movaps {{.*#+}} xmm5 = [0,0,0,4294967295]
> -; SSSE3-NEXT: andps %xmm5, %xmm0
> -; SSSE3-NEXT: orps %xmm2, %xmm0
> -; SSSE3-NEXT: andps %xmm4, %xmm3
> -; SSSE3-NEXT: andps %xmm5, %xmm1
> -; SSSE3-NEXT: orps %xmm3, %xmm1
> +; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,0],xmm2[2,0]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1],xmm0[2,0]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,0],xmm3[2,0]
> +; SSSE3-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,1],xmm1[2,0]
> +; SSSE3-NEXT: movaps %xmm2, %xmm0
> +; SSSE3-NEXT: movaps %xmm3, %xmm1
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: constant_blendvps_avx:
> @@ -626,26 +658,134 @@ entry:
> define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
> %abcd) {
> ; SSE2-LABEL: constant_pblendvb_avx2:
> ; SSE2: # BB#0: # %entry
> -; SSE2-NEXT: movaps {{.*#+}} xmm4 =
> [255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255]
> -; SSE2-NEXT: andps %xmm4, %xmm2
> -; SSE2-NEXT: movaps {{.*#+}} xmm5 =
> [0,0,255,0,255,255,255,0,0,0,255,0,255,255,255,0]
> -; SSE2-NEXT: andps %xmm5, %xmm0
> -; SSE2-NEXT: orps %xmm2, %xmm0
> -; SSE2-NEXT: andps %xmm4, %xmm3
> -; SSE2-NEXT: andps %xmm5, %xmm1
> -; SSE2-NEXT: orps %xmm3, %xmm1
> +; SSE2-NEXT: movdqa %xmm0, %xmm4
> +; SSE2-NEXT: pxor %xmm5, %xmm5
> +; SSE2-NEXT: # kill: XMM0<def> XMM4<kill>
> +; SSE2-NEXT: punpckhbw {{.*#+}} xmm0 =
> xmm0[8],xmm5[8],xmm0[9],xmm5[9],xmm0[10],xmm5[10],xmm0[11],xmm5[11],xmm0[12],xmm5[12],xmm0[13],xmm5[13],xmm0[14],xmm5[14],xmm0[15],xmm5[15]
> +; SSE2-NEXT: movdqa %xmm4, %xmm6
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm6 =
> xmm6[0],xmm5[0],xmm6[1],xmm5[1],xmm6[2],xmm5[2],xmm6[3],xmm5[3],xmm6[4],xmm5[4],xmm6[5],xmm5[5],xmm6[6],xmm5[6],xmm6[7],xmm5[7]
> +; SSE2-NEXT: punpckhwd {{.*#+}} xmm6 =
> xmm6[4],xmm0[4],xmm6[5],xmm0[5],xmm6[6],xmm0[6],xmm6[7],xmm0[7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm6[0,1,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,7,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]
> +; SSE2-NEXT: movdqa %xmm2, %xmm6
> +; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 =
> xmm6[8],xmm5[8],xmm6[9],xmm5[9],xmm6[10],xmm5[10],xmm6[11],xmm5[11],xmm6[12],xmm5[12],xmm6[13],xmm5[13],xmm6[14],xmm5[14],xmm6[15],xmm5[15]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm6 = xmm6[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm6[0,3,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm6 = xmm6[1,0,2,3,4,5,6,7]
> +; SSE2-NEXT: movdqa %xmm2, %xmm7
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm7 =
> xmm7[0],xmm5[0],xmm7[1],xmm5[1],xmm7[2],xmm5[2],xmm7[3],xmm5[3],xmm7[4],xmm5[4],xmm7[5],xmm5[5],xmm7[6],xmm5[6],xmm7[7],xmm5[7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm7 = xmm7[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm7 = xmm7[0,3,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm7 = xmm7[1,0,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklqdq {{.*#+}} xmm7 = xmm7[0],xmm6[0]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm7[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm6 = xmm6[0,2,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm6 =
> xmm6[0],xmm0[0],xmm6[1],xmm0[1],xmm6[2],xmm0[2],xmm6[3],xmm0[3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm6[0,3,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm6 = xmm0[0,3,2,1,4,5,6,7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm7[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm6 =
> xmm6[0],xmm0[0],xmm6[1],xmm0[1],xmm6[2],xmm0[2],xmm6[3],xmm0[3]
> +; SSE2-NEXT: packuswb %xmm0, %xmm6
> +; SSE2-NEXT: movdqa {{.*#+}} xmm7 = [255,255,255,255,255,255,255,255]
> +; SSE2-NEXT: pand %xmm7, %xmm4
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm4[3,1,2,3]
> +; SSE2-NEXT: pand %xmm7, %xmm2
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm2[0,2,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 =
> xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,3,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm4[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
> +; SSE2-NEXT: packuswb %xmm0, %xmm0
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm6[0],xmm0[1],xmm6[1],xmm0[2],xmm6[2],xmm0[3],xmm6[3],xmm0[4],xmm6[4],xmm0[5],xmm6[5],xmm0[6],xmm6[6],xmm0[7],xmm6[7]
> +; SSE2-NEXT: movdqa %xmm1, %xmm2
> +; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 =
> xmm2[8],xmm5[8],xmm2[9],xmm5[9],xmm2[10],xmm5[10],xmm2[11],xmm5[11],xmm2[12],xmm5[12],xmm2[13],xmm5[13],xmm2[14],xmm5[14],xmm2[15],xmm5[15]
> +; SSE2-NEXT: movdqa %xmm1, %xmm4
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm4 =
> xmm4[0],xmm5[0],xmm4[1],xmm5[1],xmm4[2],xmm5[2],xmm4[3],xmm5[3],xmm4[4],xmm5[4],xmm4[5],xmm5[5],xmm4[6],xmm5[6],xmm4[7],xmm5[7]
> +; SSE2-NEXT: punpckhwd {{.*#+}} xmm4 =
> xmm4[4],xmm2[4],xmm4[5],xmm2[5],xmm4[6],xmm2[6],xmm4[7],xmm2[7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm4[0,1,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,4,5,7,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[3,1,2,3]
> +; SSE2-NEXT: movdqa %xmm3, %xmm4
> +; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 =
> xmm4[8],xmm5[8],xmm4[9],xmm5[9],xmm4[10],xmm5[10],xmm4[11],xmm5[11],xmm4[12],xmm5[12],xmm4[13],xmm5[13],xmm4[14],xmm5[14],xmm4[15],xmm5[15]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm4[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm4 = xmm4[0,3,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm4[1,0,2,3,4,5,6,7]
> +; SSE2-NEXT: movdqa %xmm3, %xmm6
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm6 =
> xmm6[0],xmm5[0],xmm6[1],xmm5[1],xmm6[2],xmm5[2],xmm6[3],xmm5[3],xmm6[4],xmm5[4],xmm6[5],xmm5[5],xmm6[6],xmm5[6],xmm6[7],xmm5[7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm5 = xmm6[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm5[0,3,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm5 = xmm5[1,0,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklqdq {{.*#+}} xmm5 = xmm5[0],xmm4[0]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm4 = xmm5[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm4[0,2,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm4 =
> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm4[0,3,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,6,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm2[0,3,2,1,4,5,6,7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm5[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm4 =
> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3]
> +; SSE2-NEXT: packuswb %xmm0, %xmm4
> +; SSE2-NEXT: pand %xmm7, %xmm1
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
> +; SSE2-NEXT: pand %xmm7, %xmm3
> +; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm3 = xmm3[0,2,2,3,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 =
> xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm3[0,3,2,1]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,6,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm2 = xmm2[0,3,2,1,4,5,6,7]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
> +; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
> +; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
> +; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 =
> xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
> +; SSE2-NEXT: packuswb %xmm0, %xmm2
> +; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 =
> xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
> +; SSE2-NEXT: movdqa %xmm2, %xmm1
> ; SSE2-NEXT: retq
> ;
> ; SSSE3-LABEL: constant_pblendvb_avx2:
> ; SSSE3: # BB#0: # %entry
> -; SSSE3-NEXT: movaps {{.*#+}} xmm4 =
> [255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255]
> -; SSSE3-NEXT: andps %xmm4, %xmm2
> -; SSSE3-NEXT: movaps {{.*#+}} xmm5 =
> [0,0,255,0,255,255,255,0,0,0,255,0,255,255,255,0]
> -; SSSE3-NEXT: andps %xmm5, %xmm0
> -; SSSE3-NEXT: orps %xmm2, %xmm0
> -; SSSE3-NEXT: andps %xmm4, %xmm3
> -; SSSE3-NEXT: andps %xmm5, %xmm1
> -; SSSE3-NEXT: orps %xmm3, %xmm1
> +; SSSE3-NEXT: movdqa {{.*#+}} xmm8 =
> <128,128,5,128,128,128,13,128,u,u,u,u,u,u,u,u>
> +; SSSE3-NEXT: movdqa %xmm0, %xmm5
> +; SSSE3-NEXT: pshufb %xmm8, %xmm5
> +; SSSE3-NEXT: movdqa {{.*#+}} xmm6 =
> <1,3,128,7,9,11,128,15,u,u,u,u,u,u,u,u>
> +; SSSE3-NEXT: movdqa %xmm2, %xmm7
> +; SSSE3-NEXT: pshufb %xmm6, %xmm7
> +; SSSE3-NEXT: por %xmm5, %xmm7
> +; SSSE3-NEXT: movdqa {{.*#+}} xmm5 =
> <0,128,128,128,8,128,128,128,u,u,u,u,u,u,u,u>
> +; SSSE3-NEXT: pshufb %xmm5, %xmm2
> +; SSSE3-NEXT: movdqa {{.*#+}} xmm4 =
> <128,2,4,6,128,10,12,14,u,u,u,u,u,u,u,u>
> +; SSSE3-NEXT: pshufb %xmm4, %xmm0
> +; SSSE3-NEXT: por %xmm2, %xmm0
> +; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm7[0],xmm0[1],xmm7[1],xmm0[2],xmm7[2],xmm0[3],xmm7[3],xmm0[4],xmm7[4],xmm0[5],xmm7[5],xmm0[6],xmm7[6],xmm0[7],xmm7[7]
> +; SSSE3-NEXT: movdqa %xmm1, %xmm2
> +; SSSE3-NEXT: pshufb %xmm8, %xmm2
> +; SSSE3-NEXT: movdqa %xmm3, %xmm7
> +; SSSE3-NEXT: pshufb %xmm6, %xmm7
> +; SSSE3-NEXT: por %xmm2, %xmm7
> +; SSSE3-NEXT: pshufb %xmm5, %xmm3
> +; SSSE3-NEXT: pshufb %xmm4, %xmm1
> +; SSSE3-NEXT: por %xmm3, %xmm1
> +; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 =
> xmm1[0],xmm7[0],xmm1[1],xmm7[1],xmm1[2],xmm7[2],xmm1[3],xmm7[3],xmm1[4],xmm7[4],xmm1[5],xmm7[5],xmm1[6],xmm7[6],xmm1[7],xmm7[7]
> ; SSSE3-NEXT: retq
> ;
> ; SSE41-LABEL: constant_pblendvb_avx2:
> @@ -660,9 +800,27 @@ define <32 x i8> @constant_pblendvb_avx2
> ;
> ; AVX1-LABEL: constant_pblendvb_avx2:
> ; AVX1: # BB#0: # %entry
> -; AVX1-NEXT: vandps {{.*}}(%rip), %ymm1, %ymm1
> -; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
> -; AVX1-NEXT: vorps %ymm1, %ymm0, %ymm0
> +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
> +; AVX1-NEXT: vmovdqa {{.*#+}} xmm8 =
> <128,128,5,128,128,128,13,128,u,u,u,u,u,u,u,u>
> +; AVX1-NEXT: vpshufb %xmm8, %xmm2, %xmm4
> +; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm5
> +; AVX1-NEXT: vmovdqa {{.*#+}} xmm6 =
> <1,3,128,7,9,11,128,15,u,u,u,u,u,u,u,u>
> +; AVX1-NEXT: vpshufb %xmm6, %xmm5, %xmm7
> +; AVX1-NEXT: vpor %xmm4, %xmm7, %xmm4
> +; AVX1-NEXT: vmovdqa {{.*#+}} xmm7 =
> <0,128,128,128,8,128,128,128,u,u,u,u,u,u,u,u>
> +; AVX1-NEXT: vpshufb %xmm7, %xmm5, %xmm5
> +; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 =
> <128,2,4,6,128,10,12,14,u,u,u,u,u,u,u,u>
> +; AVX1-NEXT: vpshufb %xmm3, %xmm2, %xmm2
> +; AVX1-NEXT: vpor %xmm5, %xmm2, %xmm2
> +; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm2 =
> xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
> +; AVX1-NEXT: vpshufb %xmm8, %xmm0, %xmm4
> +; AVX1-NEXT: vpshufb %xmm6, %xmm1, %xmm5
> +; AVX1-NEXT: vpor %xmm4, %xmm5, %xmm4
> +; AVX1-NEXT: vpshufb %xmm7, %xmm1, %xmm1
> +; AVX1-NEXT: vpshufb %xmm3, %xmm0, %xmm0
> +; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
> +; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 =
> xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3],xmm0[4],xmm4[4],xmm0[5],xmm4[5],xmm0[6],xmm4[6],xmm0[7],xmm4[7]
> +; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
> ; AVX1-NEXT: retq
> ;
> ; AVX2-LABEL: constant_pblendvb_avx2:
>
> Modified: llvm/trunk/test/CodeGen/X86/vselect.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect.ll?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vselect.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vselect.ll Thu Feb 19 04:36:19 2015
> @@ -6,9 +6,8 @@
> define <4 x float> @test1(<4 x float> %a, <4 x float> %b) {
> ; CHECK-LABEL: test1:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm1
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
> -; CHECK-NEXT: orps %xmm1, %xmm0
> +; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
> +; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
> ; CHECK-NEXT: retq
> %1 = select <4 x i1> <i1 true, i1 false, i1 true, i1 false>, <4 x
> float> %a, <4 x float> %b
> ret <4 x float> %1
> @@ -53,9 +52,6 @@ define <4 x float> @test5(<4 x float> %a
> define <8 x i16> @test6(<8 x i16> %a, <8 x i16> %b) {
> ; CHECK-LABEL: test6:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
> -; CHECK-NEXT: orps {{.*}}(%rip), %xmm1
> -; CHECK-NEXT: andps %xmm1, %xmm0
> ; CHECK-NEXT: retq
> %1 = select <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 true, i1
> false, i1 true, i1 false>, <8 x i16> %a, <8 x i16> %a
> ret <8 x i16> %1
> @@ -64,9 +60,8 @@ define <8 x i16> @test6(<8 x i16> %a, <8
> define <8 x i16> @test7(<8 x i16> %a, <8 x i16> %b) {
> ; CHECK-LABEL: test7:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm1
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
> -; CHECK-NEXT: orps %xmm1, %xmm0
> +; CHECK-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
> +; CHECK-NEXT: movapd %xmm1, %xmm0
> ; CHECK-NEXT: retq
> %1 = select <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 false, i1
> false, i1 false, i1 false>, <8 x i16> %a, <8 x i16> %b
> ret <8 x i16> %1
> @@ -75,9 +70,7 @@ define <8 x i16> @test7(<8 x i16> %a, <8
> define <8 x i16> @test8(<8 x i16> %a, <8 x i16> %b) {
> ; CHECK-LABEL: test8:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm1
> -; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
> -; CHECK-NEXT: orps %xmm1, %xmm0
> +; CHECK-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
> ; CHECK-NEXT: retq
> %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 false, i1 true,
> i1 true, i1 true, i1 true>, <8 x i16> %a, <8 x i16> %b
> ret <8 x i16> %1
> @@ -103,10 +96,10 @@ define <8 x i16> @test10(<8 x i16> %a, <
> define <8 x i16> @test11(<8 x i16> %a, <8 x i16> %b) {
> ; CHECK-LABEL: test11:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: movaps {{.*#+}} xmm2 = <0,65535,65535,0,u,65535,65535,u>
> -; CHECK-NEXT: andps %xmm2, %xmm0
> -; CHECK-NEXT: andnps %xmm1, %xmm2
> -; CHECK-NEXT: orps %xmm2, %xmm0
> +; CHECK-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
> +; CHECK-NEXT: punpcklwd {{.*#+}} xmm0 =
> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
> +; CHECK-NEXT: pshufb {{.*#+}} xmm0 =
> xmm0[2,3,4,5,8,9,14,15,8,9,14,15,12,13,14,15]
> +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
> ; CHECK-NEXT: retq
> %1 = select <8 x i1> <i1 false, i1 true, i1 true, i1 false, i1 undef,
> i1 true, i1 true, i1 undef>, <8 x i16> %a, <8 x i16> %b
> ret <8 x i16> %1
>
> Modified:
> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll?rev=229835&r1=229834&r2=229835&view=diff
>
> ==============================================================================
> ---
> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
> (original)
> +++
> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
> Thu Feb 19 04:36:19 2015
> @@ -50,8 +50,8 @@ define void @vectorselect(i1 %cond) {
> %7 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
> %8 = icmp ult i64 %indvars.iv, 8
>
> -; A vector select has a cost of 4 on core2
> -; CHECK: cost of 4 for VF 2 {{.*}} select i1 %8, i32 %6, i32 0
> +; A vector select has a cost of 1 on core2
> +; CHECK: cost of 1 for VF 2 {{.*}} select i1 %8, i32 %6, i32 0
>
> %sel = select i1 %8, i32 %6, i32 zeroinitializer
> store i32 %sel, i32* %7, align 4
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150304/6f4abdb8/attachment.html>
More information about the llvm-commits
mailing list