[llvm] r229835 - [x86, sdag] Two interrelated changes to the x86 and sdag code.

Sergey Matveev earthdok at google.com
Tue Mar 3 12:10:41 PST 2015


On Tue, Mar 3, 2015 at 11:10 PM, Sergey Matveev <earthdok at google.com> wrote:

> This is causing a clang crash in Chromium:
> http://llvm.org/bugs/show_bug.cgi?id=22773
>
> We really can't delay the clang roll any longer, and would much rather
> revert this patch until it's fixed. However, I'm having trouble reverting
> it cleanly - check-llvm appears to hang forever. Can someone more familiar
> with the code please take a look?
>
> On Thu, Feb 19, 2015 at 1:36 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>> Author: chandlerc
>> Date: Thu Feb 19 04:36:19 2015
>> New Revision: 229835
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=229835&view=rev
>> Log:
>> [x86,sdag] Two interrelated changes to the x86 and sdag code.
>>
>> First, don't combine bit masking into vector shuffles (even ones the
>> target can handle) once operation legalization has taken place. Custom
>> legalization of vector shuffles may exist for these patterns (making the
>> predicate return true) but that custom legalization may in some cases
>> produce the exact bit math this matches. We only really want to handle
>> this prior to operation legalization.
>>
>> However, the x86 backend, in a fit of awesome, relied on this. What it
>> would do is mark VSELECTs as expand, which would turn them into
>> arithmetic, which this would then match back into vector shuffles, which
>> we would then lower properly. Amazing.
>>
>> Instead, the second change is to teach the x86 backend to directly form
>> vector shuffles from VSELECT nodes with constant conditions, and to mark
>> all of the vector types we support lowering blends as shuffles as custom
>> VSELECT lowering. We still mark the forms which actually support
>> variable blends as *legal* so that the custom lowering is bypassed, and
>> the legal lowering can even be used by the vector shuffle legalization
>> (yes, i know, this is confusing. but that's how the patterns are
>> written).
>>
>> This makes the VSELECT lowering much more sensible, and in fact should
>> fix a bunch of bugs with it. However, as you'll see in the test cases,
>> right now what it does is point out the *hilarious* deficiency of the
>> new vector shuffle lowering when it comes to blends. Fortunately, my
>> very next patch fixes that. I can't submit it yet, because that patch,
>> somewhat obviously, forms the exact and/or pattern that the DAG combine
>> is matching here! Without this patch, teaching the vector shuffle
>> lowering to produce the right code infloops in the DAG combiner. With
>> this patch alone, we produce terrible code but at least lower through
>> the right paths. With both patches, all the regressions here should be
>> fixed, and a bunch of the improvements (like using 2 shufps with no
>> memory loads instead of 2 andps with memory loads and an orps) will
>> stay. Win!
>>
>> There is one other change worth noting here. We had hilariously wrong
>> vectorization cost estimates for vselect because we fell through to the
>> code path that assumed all "expand" vector operations are scalarized.
>> However, the "expand" lowering of VSELECT is vector bit math, most
>> definitely not scalarized. So now we go back to the correct if horribly
>> naive cost of "1" for "not scalarized". If anyone wants to add actual
>> modeling of shuffle costs, that would be cool, but this seems an
>> improvement on its own. Note the removal of 16 and 32 "costs" for doing
>> a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
>> course, we don't right now because of OMG bad code, but I'm going to fix
>> that. Next patch. I promise.
>>
>> Modified:
>>     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>     llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll
>>     llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>     llvm/trunk/test/CodeGen/X86/vselect.ll
>>
>> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
>>
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Thu Feb 19
>> 04:36:19 2015
>> @@ -11973,9 +11973,11 @@ SDValue DAGCombiner::XformToShuffleWithZ
>>            return SDValue();
>>        }
>>
>> -      // Let's see if the target supports this vector_shuffle.
>> +      // Let's see if the target supports this vector_shuffle and make
>> sure
>> +      // we're not running after operation legalization where it may have
>> +      // custom lowered the vector shuffles.
>>        EVT RVT = RHS.getValueType();
>> -      if (!TLI.isVectorClearMaskLegal(Indices, RVT))
>> +      if (LegalOperations || !TLI.isVectorClearMaskLegal(Indices, RVT))
>>          return SDValue();
>>
>>        // Return the new VECTOR_SHUFFLE node.
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Feb 19 04:36:19 2015
>> @@ -926,6 +926,7 @@ X86TargetLowering::X86TargetLowering(con
>>      setOperationAction(ISD::LOAD,               MVT::v4f32, Legal);
>>      setOperationAction(ISD::BUILD_VECTOR,       MVT::v4f32, Custom);
>>      setOperationAction(ISD::VECTOR_SHUFFLE,     MVT::v4f32, Custom);
>> +    setOperationAction(ISD::VSELECT,            MVT::v4f32, Custom);
>>      setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);
>>      setOperationAction(ISD::SELECT,             MVT::v4f32, Custom);
>>      setOperationAction(ISD::UINT_TO_FP,         MVT::v4i32, Custom);
>> @@ -994,6 +995,7 @@ X86TargetLowering::X86TargetLowering(con
>>          continue;
>>        setOperationAction(ISD::BUILD_VECTOR,       VT, Custom);
>>        setOperationAction(ISD::VECTOR_SHUFFLE,     VT, Custom);
>> +      setOperationAction(ISD::VSELECT,            VT, Custom);
>>        setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
>>      }
>>
>> @@ -1017,6 +1019,8 @@ X86TargetLowering::X86TargetLowering(con
>>      setOperationAction(ISD::BUILD_VECTOR,       MVT::v2i64, Custom);
>>      setOperationAction(ISD::VECTOR_SHUFFLE,     MVT::v2f64, Custom);
>>      setOperationAction(ISD::VECTOR_SHUFFLE,     MVT::v2i64, Custom);
>> +    setOperationAction(ISD::VSELECT,            MVT::v2f64, Custom);
>> +    setOperationAction(ISD::VSELECT,            MVT::v2i64, Custom);
>>      setOperationAction(ISD::INSERT_VECTOR_ELT,  MVT::v2f64, Custom);
>>      setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Custom);
>>
>> @@ -1098,13 +1102,8 @@ X86TargetLowering::X86TargetLowering(con
>>      // FIXME: Do we need to handle scalar-to-vector here?
>>      setOperationAction(ISD::MUL,                MVT::v4i32, Legal);
>>
>> -    setOperationAction(ISD::VSELECT,            MVT::v2f64, Custom);
>> -    setOperationAction(ISD::VSELECT,            MVT::v2i64, Custom);
>> -    setOperationAction(ISD::VSELECT,            MVT::v4i32, Custom);
>> -    setOperationAction(ISD::VSELECT,            MVT::v4f32, Custom);
>> -    setOperationAction(ISD::VSELECT,            MVT::v8i16, Custom);
>> -    // There is no BLENDI for byte vectors. We don't need to custom lower
>> -    // some vselects for now.
>> +    // We directly match byte blends in the backend as they match the
>> VSELECT
>> +    // condition form.
>>      setOperationAction(ISD::VSELECT,            MVT::v16i8, Legal);
>>
>>      // SSE41 brings specific instructions for doing vector sign extend
>> even in
>> @@ -1245,11 +1244,6 @@ X86TargetLowering::X86TargetLowering(con
>>      setOperationAction(ISD::SELECT,            MVT::v4i64, Custom);
>>      setOperationAction(ISD::SELECT,            MVT::v8f32, Custom);
>>
>> -    setOperationAction(ISD::VSELECT,           MVT::v4f64, Custom);
>> -    setOperationAction(ISD::VSELECT,           MVT::v4i64, Custom);
>> -    setOperationAction(ISD::VSELECT,           MVT::v8i32, Custom);
>> -    setOperationAction(ISD::VSELECT,           MVT::v8f32, Custom);
>> -
>>      setOperationAction(ISD::SIGN_EXTEND,       MVT::v4i64, Custom);
>>      setOperationAction(ISD::SIGN_EXTEND,       MVT::v8i32, Custom);
>>      setOperationAction(ISD::SIGN_EXTEND,       MVT::v16i16, Custom);
>> @@ -1293,9 +1287,6 @@ X86TargetLowering::X86TargetLowering(con
>>        setOperationAction(ISD::MULHU,           MVT::v16i16, Legal);
>>        setOperationAction(ISD::MULHS,           MVT::v16i16, Legal);
>>
>> -      setOperationAction(ISD::VSELECT,         MVT::v16i16, Custom);
>> -      setOperationAction(ISD::VSELECT,         MVT::v32i8, Legal);
>> -
>>        // The custom lowering for UINT_TO_FP for v8i32 becomes interesting
>>        // when we have a 256bit-wide blend with immediate.
>>        setOperationAction(ISD::UINT_TO_FP, MVT::v8i32, Custom);
>> @@ -1368,6 +1359,7 @@ X86TargetLowering::X86TargetLowering(con
>>
>>        setOperationAction(ISD::BUILD_VECTOR,       VT, Custom);
>>        setOperationAction(ISD::VECTOR_SHUFFLE,     VT, Custom);
>> +      setOperationAction(ISD::VSELECT,            VT, Custom);
>>        setOperationAction(ISD::INSERT_VECTOR_ELT,  VT, Custom);
>>        setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
>>        setOperationAction(ISD::SCALAR_TO_VECTOR,   VT, Custom);
>> @@ -1375,6 +1367,10 @@ X86TargetLowering::X86TargetLowering(con
>>        setOperationAction(ISD::CONCAT_VECTORS,     VT, Custom);
>>      }
>>
>> +    if (Subtarget->hasInt256())
>> +      setOperationAction(ISD::VSELECT,         MVT::v32i8, Legal);
>> +
>> +
>>      // Promote v32i8, v16i16, v8i32 select, and, or, xor to v4i64.
>>      for (int i = MVT::v32i8; i != MVT::v4i64; ++i) {
>>        MVT VT = (MVT::SimpleValueType)i;
>> @@ -13139,48 +13135,29 @@ static bool BUILD_VECTORtoBlendMask(Buil
>>    return true;
>>  }
>>
>> -/// \brief Try to lower a VSELECT instruction to an immediate-controlled
>> blend
>> -/// instruction.
>> -static SDValue lowerVSELECTtoBLENDI(SDValue Op, const X86Subtarget
>> *Subtarget,
>> -                                    SelectionDAG &DAG) {
>> +/// \brief Try to lower a VSELECT instruction to a vector shuffle.
>> +static SDValue lowerVSELECTtoVectorShuffle(SDValue Op,
>> +                                           const X86Subtarget *Subtarget,
>> +                                           SelectionDAG &DAG) {
>>    SDValue Cond = Op.getOperand(0);
>>    SDValue LHS = Op.getOperand(1);
>>    SDValue RHS = Op.getOperand(2);
>>    SDLoc dl(Op);
>>    MVT VT = Op.getSimpleValueType();
>> -  MVT EltVT = VT.getVectorElementType();
>> -  unsigned NumElems = VT.getVectorNumElements();
>> -
>> -  // There is no blend with immediate in AVX-512.
>> -  if (VT.is512BitVector())
>> -    return SDValue();
>> -
>> -  if (!Subtarget->hasSSE41() || EltVT == MVT::i8)
>> -    return SDValue();
>> -  if (!Subtarget->hasInt256() && VT == MVT::v16i16)
>> -    return SDValue();
>>
>>    if (!ISD::isBuildVectorOfConstantSDNodes(Cond.getNode()))
>>      return SDValue();
>> +  auto *CondBV = cast<BuildVectorSDNode>(Cond);
>>
>> -  // Check the mask for BLEND and build the value.
>> -  unsigned MaskValue = 0;
>> -  if (!BUILD_VECTORtoBlendMask(cast<BuildVectorSDNode>(Cond), MaskValue))
>> -    return SDValue();
>> -
>> -  // Convert i32 vectors to floating point if it is not AVX2.
>> -  // AVX2 introduced VPBLENDD instruction for 128 and 256-bit vectors.
>> -  MVT BlendVT = VT;
>> -  if (EltVT == MVT::i64 || (EltVT == MVT::i32 &&
>> !Subtarget->hasInt256())) {
>> -    BlendVT =
>> MVT::getVectorVT(MVT::getFloatingPointVT(EltVT.getSizeInBits()),
>> -                               NumElems);
>> -    LHS = DAG.getNode(ISD::BITCAST, dl, VT, LHS);
>> -    RHS = DAG.getNode(ISD::BITCAST, dl, VT, RHS);
>> +  // Only non-legal VSELECTs reach this lowering, convert those into
>> generic
>> +  // shuffles and re-use the shuffle lowering path for blends.
>> +  SmallVector<int, 32> Mask;
>> +  for (int i = 0, Size = VT.getVectorNumElements(); i < Size; ++i) {
>> +    SDValue CondElt = CondBV->getOperand(i);
>> +    Mask.push_back(
>> +        isa<ConstantSDNode>(CondElt) ? i + (isZero(CondElt) ? Size : 0)
>> : -1);
>>    }
>> -
>> -  SDValue Ret = DAG.getNode(X86ISD::BLENDI, dl, BlendVT, LHS, RHS,
>> -                            DAG.getConstant(MaskValue, MVT::i32));
>> -  return DAG.getNode(ISD::BITCAST, dl, VT, Ret);
>> +  return DAG.getVectorShuffle(VT, dl, LHS, RHS, Mask);
>>  }
>>
>>  SDValue X86TargetLowering::LowerVSELECT(SDValue Op, SelectionDAG &DAG)
>> const {
>> @@ -13191,10 +13168,16 @@ SDValue X86TargetLowering::LowerVSELECT(
>>        ISD::isBuildVectorOfConstantSDNodes(Op.getOperand(2).getNode()))
>>      return SDValue();
>>
>> -  SDValue BlendOp = lowerVSELECTtoBLENDI(Op, Subtarget, DAG);
>> +  // Try to lower this to a blend-style vector shuffle. This can handle
>> all
>> +  // constant condition cases.
>> +  SDValue BlendOp = lowerVSELECTtoVectorShuffle(Op, Subtarget, DAG);
>>    if (BlendOp.getNode())
>>      return BlendOp;
>>
>> +  // Variable blends are only legal from SSE4.1 onward.
>> +  if (!Subtarget->hasSSE41())
>> +    return SDValue();
>> +
>>    // Some types for vselect were previously set to Expand, not Legal or
>>    // Custom. Return an empty SDValue so we fall-through to Expand, after
>>    // the Custom lowering phase.
>>
>> Modified: llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll (original)
>> +++ llvm/trunk/test/Analysis/CostModel/X86/vselect-cost.ll Thu Feb 19
>> 04:36:19 2015
>> @@ -11,7 +11,7 @@
>>
>>  define <2 x i64> @test_2i64(<2 x i64> %a, <2 x i64> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_2i64':
>> -; SSE2: Cost Model: {{.*}} 4 for instruction:   %sel = select <2 x i1>
>> +; SSE2: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; SSE41: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>> @@ -21,7 +21,7 @@ define <2 x i64> @test_2i64(<2 x i64> %a
>>
>>  define <2 x double> @test_2double(<2 x double> %a, <2 x double> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_2double':
>> -; SSE2: Cost Model: {{.*}} 3 for instruction:   %sel = select <2 x i1>
>> +; SSE2: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; SSE41: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <2 x i1>
>> @@ -31,7 +31,7 @@ define <2 x double> @test_2double(<2 x d
>>
>>  define <4 x i32> @test_4i32(<4 x i32> %a, <4 x i32> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_4i32':
>> -; SSE2: Cost Model: {{.*}} 8 for instruction:   %sel = select <4 x i1>
>> +; SSE2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; SSE41: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>> @@ -41,7 +41,7 @@ define <4 x i32> @test_4i32(<4 x i32> %a
>>
>>  define <4 x float> @test_4float(<4 x float> %a, <4 x float> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_4float':
>> -; SSE2: Cost Model: {{.*}} 7 for instruction:   %sel = select <4 x i1>
>> +; SSE2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; SSE41: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>> @@ -51,7 +51,7 @@ define <4 x float> @test_4float(<4 x flo
>>
>>  define <16 x i8> @test_16i8(<16 x i8> %a, <16 x i8> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_16i8':
>> -; SSE2: Cost Model: {{.*}} 32 for instruction:   %sel = select <16 x i1>
>> +; SSE2: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>>  ; SSE41: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>> @@ -63,7 +63,7 @@ define <16 x i8> @test_16i8(<16 x i8> %a
>>  ; <8 x float>. Integers of the same size should also use those
>> instructions.
>>  define <4 x i64> @test_4i64(<4 x i64> %a, <4 x i64> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_4i64':
>> -; SSE2: Cost Model: {{.*}} 8 for instruction:   %sel = select <4 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <4 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <4 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>> @@ -73,7 +73,7 @@ define <4 x i64> @test_4i64(<4 x i64> %a
>>
>>  define <4 x double> @test_4double(<4 x double> %a, <4 x double> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_4double':
>> -; SSE2: Cost Model: {{.*}} 6 for instruction:   %sel = select <4 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <4 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <4 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <4 x i1>
>> @@ -83,7 +83,7 @@ define <4 x double> @test_4double(<4 x d
>>
>>  define <8 x i32> @test_8i32(<8 x i32> %a, <8 x i32> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_8i32':
>> -; SSE2: Cost Model: {{.*}} 16 for instruction:   %sel = select <8 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <8 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <8 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <8 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <8 x i1>
>> @@ -93,7 +93,7 @@ define <8 x i32> @test_8i32(<8 x i32> %a
>>
>>  define <8 x float> @test_8float(<8 x float> %a, <8 x float> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_8float':
>> -; SSE2: Cost Model: {{.*}} 14 for instruction:   %sel = select <8 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <8 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <8 x i1>
>>  ; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <8 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <8 x i1>
>> @@ -104,10 +104,9 @@ define <8 x float> @test_8float(<8 x flo
>>  ; AVX2
>>  define <16 x i16> @test_16i16(<16 x i16> %a, <16 x i16> %b) {
>>  ; CHECK:Printing analysis 'Cost Model Analysis' for function
>> 'test_16i16':
>> -; SSE2: Cost Model: {{.*}} 32 for instruction:   %sel = select <16 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <16 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <16 x i1>
>> -;;; FIXME: This AVX cost is obviously wrong. We shouldn't be scalarizing.
>> -; AVX: Cost Model: {{.*}} 32 for instruction:   %sel = select <16 x i1>
>> +; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <16 x i1>
>>    %sel = select <16 x i1> <i1 true, i1 false, i1 false, i1 false, i1
>> true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false,
>> i1 true, i1 false, i1 false, i1 false>, <16 x i16> %a, <16 x i16> %b
>>    ret <16 x i16> %sel
>> @@ -115,10 +114,9 @@ define <16 x i16> @test_16i16(<16 x i16>
>>
>>  define <32 x i8> @test_32i8(<32 x i8> %a, <32 x i8> %b) {
>>  ; CHECK: Printing analysis 'Cost Model Analysis' for function
>> 'test_32i8':
>> -; SSE2: Cost Model: {{.*}} 64 for instruction:   %sel = select <32 x i1>
>> +; SSE2: Cost Model: {{.*}} 2 for instruction:   %sel = select <32 x i1>
>>  ; SSE41: Cost Model: {{.*}} 2 for instruction:   %sel = select <32 x i1>
>> -;;; FIXME: This AVX cost is obviously wrong. We shouldn't be scalarizing.
>> -; AVX: Cost Model: {{.*}} 64 for instruction:   %sel = select <32 x i1>
>> +; AVX: Cost Model: {{.*}} 1 for instruction:   %sel = select <32 x i1>
>>  ; AVX2: Cost Model: {{.*}} 1 for instruction:   %sel = select <32 x i1>
>>    %sel = select <32 x i1> <i1 true, i1 false, i1 true, i1 true, i1 true,
>> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
>> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
>> i1 false, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true,
>> i1 false, i1 true, i1 true>, <32 x i8> %a, <32 x i8> %b
>>    ret <32 x i8> %sel
>>
>> Modified: llvm/trunk/test/CodeGen/X86/vector-blend.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-blend.ll?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/vector-blend.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/vector-blend.ll Thu Feb 19 04:36:19 2015
>> @@ -9,16 +9,14 @@
>>  define <4 x float> @vsel_float(<4 x float> %v1, <4 x float> %v2) {
>>  ; SSE2-LABEL: vsel_float:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_float:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_float:
>> @@ -65,16 +63,14 @@ entry:
>>  define <4 x i8> @vsel_4xi8(<4 x i8> %v1, <4 x i8> %v2) {
>>  ; SSE2-LABEL: vsel_4xi8:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[3,0]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_4xi8:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[3,0]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_4xi8:
>> @@ -99,16 +95,16 @@ entry:
>>  define <4 x i16> @vsel_4xi16(<4 x i16> %v1, <4 x i16> %v2) {
>>  ; SSE2-LABEL: vsel_4xi16:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm1 = xmm1[1,0],xmm0[0,0]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[2,3]
>> +; SSE2-NEXT:    movaps %xmm1, %xmm0
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_4xi16:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm1 = xmm1[1,0],xmm0[0,0]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm1 = xmm1[2,0],xmm0[2,3]
>> +; SSSE3-NEXT:    movaps %xmm1, %xmm0
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_4xi16:
>> @@ -133,16 +129,16 @@ entry:
>>  define <4 x i32> @vsel_i32(<4 x i32> %v1, <4 x i32> %v2) {
>>  ; SSE2-LABEL: vsel_i32:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[1,3,2,3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_i32:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[1,3,2,3]
>> +; SSSE3-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSSE3-NEXT:    punpckldq {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_i32:
>> @@ -226,16 +222,30 @@ entry:
>>  define <8 x i16> @vsel_8xi16(<8 x i16> %v1, <8 x i16> %v2) {
>>  ; SSE2-LABEL: vsel_8xi16:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,3,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_8xi16:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
>> +; SSSE3-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSSE3-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
>> +; SSSE3-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
>> +; SSSE3-NEXT:    pshufb {{.*#+}} xmm0 =
>> xmm0[0,1,10,11,4,5,2,3,4,5,10,11,4,5,6,7]
>> +; SSSE3-NEXT:    pshufb {{.*#+}} xmm1 =
>> xmm1[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
>> +; SSSE3-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_8xi16:
>> @@ -255,16 +265,42 @@ entry:
>>  define <16 x i8> @vsel_i8(<16 x i8> %v1, <16 x i8> %v2) {
>>  ; SSE2-LABEL: vsel_i8:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSE2-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSE2-NEXT:    orps %xmm1, %xmm0
>> +; SSE2-NEXT:    pxor %xmm2, %xmm2
>> +; SSE2-NEXT:    movdqa %xmm1, %xmm3
>> +; SSE2-NEXT:    punpckhbw {{.*#+}} xmm3 =
>> xmm3[8],xmm2[8],xmm3[9],xmm2[9],xmm3[10],xmm2[10],xmm3[11],xmm2[11],xmm3[12],xmm2[12],xmm3[13],xmm2[13],xmm3[14],xmm2[14],xmm3[15],xmm2[15]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm3 = xmm3[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm3 = xmm3[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm3 = xmm3[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    movdqa %xmm1, %xmm4
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm4 =
>> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3],xmm4[4],xmm2[4],xmm4[5],xmm2[5],xmm4[6],xmm2[6],xmm4[7],xmm2[7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm4[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm2
>> +; SSE2-NEXT:    pand {{.*}}(%rip), %xmm1
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm1
>> +; SSE2-NEXT:    pand {{.*}}(%rip), %xmm0
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm0
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm0
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: vsel_i8:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; SSSE3-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; SSSE3-NEXT:    orps %xmm1, %xmm0
>> +; SSSE3-NEXT:    movdqa %xmm1, %xmm2
>> +; SSSE3-NEXT:    pshufb {{.*#+}} xmm2 =
>> xmm2[2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u]
>> +; SSSE3-NEXT:    pshufb {{.*#+}} xmm0 =
>> xmm0[0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u]
>> +; SSSE3-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
>> +; SSSE3-NEXT:    pshufb {{.*#+}} xmm1 =
>> xmm1[1,3,5,7,9,11,13,15,u,u,u,u,u,u,u,u]
>> +; SSSE3-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: vsel_i8:
>> @@ -419,8 +455,8 @@ define <8 x i64> @vsel_i648(<8 x i64> %v
>>  ;
>>  ; SSE41-LABEL: vsel_i648:
>>  ; SSE41:       # BB#0: # %entry
>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm4[1]
>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm2 = xmm2[0],xmm6[1]
>> +; SSE41-NEXT:    pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm4[4,5,6,7]
>> +; SSE41-NEXT:    pblendw {{.*#+}} xmm2 = xmm2[0,1,2,3],xmm6[4,5,6,7]
>>  ; SSE41-NEXT:    movaps %xmm5, %xmm1
>>  ; SSE41-NEXT:    movaps %xmm7, %xmm3
>>  ; SSE41-NEXT:    retq
>> @@ -586,26 +622,22 @@ entry:
>>  define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x float>
>> %abcd) {
>>  ; SSE2-LABEL: constant_blendvps_avx:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    movaps {{.*#+}} xmm4 =
>> [4294967295,4294967295,4294967295,0]
>> -; SSE2-NEXT:    andps %xmm4, %xmm2
>> -; SSE2-NEXT:    movaps {{.*#+}} xmm5 = [0,0,0,4294967295]
>> -; SSE2-NEXT:    andps %xmm5, %xmm0
>> -; SSE2-NEXT:    orps %xmm2, %xmm0
>> -; SSE2-NEXT:    andps %xmm4, %xmm3
>> -; SSE2-NEXT:    andps %xmm5, %xmm1
>> -; SSE2-NEXT:    orps %xmm3, %xmm1
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,0],xmm2[2,0]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm2 = xmm2[0,1],xmm0[2,0]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm1 = xmm1[3,0],xmm3[2,0]
>> +; SSE2-NEXT:    shufps {{.*#+}} xmm3 = xmm3[0,1],xmm1[2,0]
>> +; SSE2-NEXT:    movaps %xmm2, %xmm0
>> +; SSE2-NEXT:    movaps %xmm3, %xmm1
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: constant_blendvps_avx:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    movaps {{.*#+}} xmm4 =
>> [4294967295,4294967295,4294967295,0]
>> -; SSSE3-NEXT:    andps %xmm4, %xmm2
>> -; SSSE3-NEXT:    movaps {{.*#+}} xmm5 = [0,0,0,4294967295]
>> -; SSSE3-NEXT:    andps %xmm5, %xmm0
>> -; SSSE3-NEXT:    orps %xmm2, %xmm0
>> -; SSSE3-NEXT:    andps %xmm4, %xmm3
>> -; SSSE3-NEXT:    andps %xmm5, %xmm1
>> -; SSSE3-NEXT:    orps %xmm3, %xmm1
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,0],xmm2[2,0]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm2 = xmm2[0,1],xmm0[2,0]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm1 = xmm1[3,0],xmm3[2,0]
>> +; SSSE3-NEXT:    shufps {{.*#+}} xmm3 = xmm3[0,1],xmm1[2,0]
>> +; SSSE3-NEXT:    movaps %xmm2, %xmm0
>> +; SSSE3-NEXT:    movaps %xmm3, %xmm1
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: constant_blendvps_avx:
>> @@ -626,26 +658,134 @@ entry:
>>  define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
>> %abcd) {
>>  ; SSE2-LABEL: constant_pblendvb_avx2:
>>  ; SSE2:       # BB#0: # %entry
>> -; SSE2-NEXT:    movaps {{.*#+}} xmm4 =
>> [255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255]
>> -; SSE2-NEXT:    andps %xmm4, %xmm2
>> -; SSE2-NEXT:    movaps {{.*#+}} xmm5 =
>> [0,0,255,0,255,255,255,0,0,0,255,0,255,255,255,0]
>> -; SSE2-NEXT:    andps %xmm5, %xmm0
>> -; SSE2-NEXT:    orps %xmm2, %xmm0
>> -; SSE2-NEXT:    andps %xmm4, %xmm3
>> -; SSE2-NEXT:    andps %xmm5, %xmm1
>> -; SSE2-NEXT:    orps %xmm3, %xmm1
>> +; SSE2-NEXT:    movdqa %xmm0, %xmm4
>> +; SSE2-NEXT:    pxor %xmm5, %xmm5
>> +; SSE2-NEXT:    # kill: XMM0<def> XMM4<kill>
>> +; SSE2-NEXT:    punpckhbw {{.*#+}} xmm0 =
>> xmm0[8],xmm5[8],xmm0[9],xmm5[9],xmm0[10],xmm5[10],xmm0[11],xmm5[11],xmm0[12],xmm5[12],xmm0[13],xmm5[13],xmm0[14],xmm5[14],xmm0[15],xmm5[15]
>> +; SSE2-NEXT:    movdqa %xmm4, %xmm6
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm6 =
>> xmm6[0],xmm5[0],xmm6[1],xmm5[1],xmm6[2],xmm5[2],xmm6[3],xmm5[3],xmm6[4],xmm5[4],xmm6[5],xmm5[5],xmm6[6],xmm5[6],xmm6[7],xmm5[7]
>> +; SSE2-NEXT:    punpckhwd {{.*#+}} xmm6 =
>> xmm6[4],xmm0[4],xmm6[5],xmm0[5],xmm6[6],xmm0[6],xmm6[7],xmm0[7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm6[0,1,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,7,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]
>> +; SSE2-NEXT:    movdqa %xmm2, %xmm6
>> +; SSE2-NEXT:    punpckhbw {{.*#+}} xmm6 =
>> xmm6[8],xmm5[8],xmm6[9],xmm5[9],xmm6[10],xmm5[10],xmm6[11],xmm5[11],xmm6[12],xmm5[12],xmm6[13],xmm5[13],xmm6[14],xmm5[14],xmm6[15],xmm5[15]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm6 = xmm6[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm6 = xmm6[0,3,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm6 = xmm6[1,0,2,3,4,5,6,7]
>> +; SSE2-NEXT:    movdqa %xmm2, %xmm7
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm7 =
>> xmm7[0],xmm5[0],xmm7[1],xmm5[1],xmm7[2],xmm5[2],xmm7[3],xmm5[3],xmm7[4],xmm5[4],xmm7[5],xmm5[5],xmm7[6],xmm5[6],xmm7[7],xmm5[7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm7 = xmm7[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm7 = xmm7[0,3,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm7 = xmm7[1,0,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklqdq {{.*#+}} xmm7 = xmm7[0],xmm6[0]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm6 = xmm7[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm6 = xmm6[0,2,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm6 =
>> xmm6[0],xmm0[0],xmm6[1],xmm0[1],xmm6[2],xmm0[2],xmm6[3],xmm0[3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm6[0,3,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm6 = xmm0[0,3,2,1,4,5,6,7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm0 = xmm7[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm6 =
>> xmm6[0],xmm0[0],xmm6[1],xmm0[1],xmm6[2],xmm0[2],xmm6[3],xmm0[3]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm6
>> +; SSE2-NEXT:    movdqa {{.*#+}} xmm7 = [255,255,255,255,255,255,255,255]
>> +; SSE2-NEXT:    pand %xmm7, %xmm4
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm4[3,1,2,3]
>> +; SSE2-NEXT:    pand %xmm7, %xmm2
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm2[0,2,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 =
>> xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm2[0,3,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm4[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm0
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm6[0],xmm0[1],xmm6[1],xmm0[2],xmm6[2],xmm0[3],xmm6[3],xmm0[4],xmm6[4],xmm0[5],xmm6[5],xmm0[6],xmm6[6],xmm0[7],xmm6[7]
>> +; SSE2-NEXT:    movdqa %xmm1, %xmm2
>> +; SSE2-NEXT:    punpckhbw {{.*#+}} xmm2 =
>> xmm2[8],xmm5[8],xmm2[9],xmm5[9],xmm2[10],xmm5[10],xmm2[11],xmm5[11],xmm2[12],xmm5[12],xmm2[13],xmm5[13],xmm2[14],xmm5[14],xmm2[15],xmm5[15]
>> +; SSE2-NEXT:    movdqa %xmm1, %xmm4
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm4 =
>> xmm4[0],xmm5[0],xmm4[1],xmm5[1],xmm4[2],xmm5[2],xmm4[3],xmm5[3],xmm4[4],xmm5[4],xmm4[5],xmm5[5],xmm4[6],xmm5[6],xmm4[7],xmm5[7]
>> +; SSE2-NEXT:    punpckhwd {{.*#+}} xmm4 =
>> xmm4[4],xmm2[4],xmm4[5],xmm2[5],xmm4[6],xmm2[6],xmm4[7],xmm2[7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm4[0,1,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,4,5,7,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[3,1,2,3]
>> +; SSE2-NEXT:    movdqa %xmm3, %xmm4
>> +; SSE2-NEXT:    punpckhbw {{.*#+}} xmm4 =
>> xmm4[8],xmm5[8],xmm4[9],xmm5[9],xmm4[10],xmm5[10],xmm4[11],xmm5[11],xmm4[12],xmm5[12],xmm4[13],xmm5[13],xmm4[14],xmm5[14],xmm4[15],xmm5[15]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm4 = xmm4[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm4 = xmm4[0,3,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm4 = xmm4[1,0,2,3,4,5,6,7]
>> +; SSE2-NEXT:    movdqa %xmm3, %xmm6
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm6 =
>> xmm6[0],xmm5[0],xmm6[1],xmm5[1],xmm6[2],xmm5[2],xmm6[3],xmm5[3],xmm6[4],xmm5[4],xmm6[5],xmm5[5],xmm6[6],xmm5[6],xmm6[7],xmm5[7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm5 = xmm6[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm5 = xmm5[0,3,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm5 = xmm5[1,0,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklqdq {{.*#+}} xmm5 = xmm5[0],xmm4[0]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm4 = xmm5[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm4 = xmm4[0,2,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm4 =
>> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm4[0,3,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,6,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm4 = xmm2[0,3,2,1,4,5,6,7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm5[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm2[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm4 =
>> xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm4
>> +; SSE2-NEXT:    pand %xmm7, %xmm1
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
>> +; SSE2-NEXT:    pand %xmm7, %xmm3
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm3 = xmm3[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm3 = xmm3[0,2,2,3,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm3 =
>> xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm3[0,3,2,1]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm2 = xmm2[0,1,2,3,6,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm2 = xmm2[0,3,2,1,4,5,6,7]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[3,1,2,3,4,5,6,7]
>> +; SSE2-NEXT:    pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
>> +; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
>> +; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[1,0,3,2,4,5,6,7]
>> +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 =
>> xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
>> +; SSE2-NEXT:    packuswb %xmm0, %xmm2
>> +; SSE2-NEXT:    punpcklbw {{.*#+}} xmm2 =
>> xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
>> +; SSE2-NEXT:    movdqa %xmm2, %xmm1
>>  ; SSE2-NEXT:    retq
>>  ;
>>  ; SSSE3-LABEL: constant_pblendvb_avx2:
>>  ; SSSE3:       # BB#0: # %entry
>> -; SSSE3-NEXT:    movaps {{.*#+}} xmm4 =
>> [255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255]
>> -; SSSE3-NEXT:    andps %xmm4, %xmm2
>> -; SSSE3-NEXT:    movaps {{.*#+}} xmm5 =
>> [0,0,255,0,255,255,255,0,0,0,255,0,255,255,255,0]
>> -; SSSE3-NEXT:    andps %xmm5, %xmm0
>> -; SSSE3-NEXT:    orps %xmm2, %xmm0
>> -; SSSE3-NEXT:    andps %xmm4, %xmm3
>> -; SSSE3-NEXT:    andps %xmm5, %xmm1
>> -; SSSE3-NEXT:    orps %xmm3, %xmm1
>> +; SSSE3-NEXT:    movdqa {{.*#+}} xmm8 =
>> <128,128,5,128,128,128,13,128,u,u,u,u,u,u,u,u>
>> +; SSSE3-NEXT:    movdqa %xmm0, %xmm5
>> +; SSSE3-NEXT:    pshufb %xmm8, %xmm5
>> +; SSSE3-NEXT:    movdqa {{.*#+}} xmm6 =
>> <1,3,128,7,9,11,128,15,u,u,u,u,u,u,u,u>
>> +; SSSE3-NEXT:    movdqa %xmm2, %xmm7
>> +; SSSE3-NEXT:    pshufb %xmm6, %xmm7
>> +; SSSE3-NEXT:    por %xmm5, %xmm7
>> +; SSSE3-NEXT:    movdqa {{.*#+}} xmm5 =
>> <0,128,128,128,8,128,128,128,u,u,u,u,u,u,u,u>
>> +; SSSE3-NEXT:    pshufb %xmm5, %xmm2
>> +; SSSE3-NEXT:    movdqa {{.*#+}} xmm4 =
>> <128,2,4,6,128,10,12,14,u,u,u,u,u,u,u,u>
>> +; SSSE3-NEXT:    pshufb %xmm4, %xmm0
>> +; SSSE3-NEXT:    por %xmm2, %xmm0
>> +; SSSE3-NEXT:    punpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm7[0],xmm0[1],xmm7[1],xmm0[2],xmm7[2],xmm0[3],xmm7[3],xmm0[4],xmm7[4],xmm0[5],xmm7[5],xmm0[6],xmm7[6],xmm0[7],xmm7[7]
>> +; SSSE3-NEXT:    movdqa %xmm1, %xmm2
>> +; SSSE3-NEXT:    pshufb %xmm8, %xmm2
>> +; SSSE3-NEXT:    movdqa %xmm3, %xmm7
>> +; SSSE3-NEXT:    pshufb %xmm6, %xmm7
>> +; SSSE3-NEXT:    por %xmm2, %xmm7
>> +; SSSE3-NEXT:    pshufb %xmm5, %xmm3
>> +; SSSE3-NEXT:    pshufb %xmm4, %xmm1
>> +; SSSE3-NEXT:    por %xmm3, %xmm1
>> +; SSSE3-NEXT:    punpcklbw {{.*#+}} xmm1 =
>> xmm1[0],xmm7[0],xmm1[1],xmm7[1],xmm1[2],xmm7[2],xmm1[3],xmm7[3],xmm1[4],xmm7[4],xmm1[5],xmm7[5],xmm1[6],xmm7[6],xmm1[7],xmm7[7]
>>  ; SSSE3-NEXT:    retq
>>  ;
>>  ; SSE41-LABEL: constant_pblendvb_avx2:
>> @@ -660,9 +800,27 @@ define <32 x i8> @constant_pblendvb_avx2
>>  ;
>>  ; AVX1-LABEL: constant_pblendvb_avx2:
>>  ; AVX1:       # BB#0: # %entry
>> -; AVX1-NEXT:    vandps {{.*}}(%rip), %ymm1, %ymm1
>> -; AVX1-NEXT:    vandps {{.*}}(%rip), %ymm0, %ymm0
>> -; AVX1-NEXT:    vorps %ymm1, %ymm0, %ymm0
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>> +; AVX1-NEXT:    vmovdqa {{.*#+}} xmm8 =
>> <128,128,5,128,128,128,13,128,u,u,u,u,u,u,u,u>
>> +; AVX1-NEXT:    vpshufb %xmm8, %xmm2, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> +; AVX1-NEXT:    vmovdqa {{.*#+}} xmm6 =
>> <1,3,128,7,9,11,128,15,u,u,u,u,u,u,u,u>
>> +; AVX1-NEXT:    vpshufb %xmm6, %xmm5, %xmm7
>> +; AVX1-NEXT:    vpor %xmm4, %xmm7, %xmm4
>> +; AVX1-NEXT:    vmovdqa {{.*#+}} xmm7 =
>> <0,128,128,128,8,128,128,128,u,u,u,u,u,u,u,u>
>> +; AVX1-NEXT:    vpshufb %xmm7, %xmm5, %xmm5
>> +; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> <128,2,4,6,128,10,12,14,u,u,u,u,u,u,u,u>
>> +; AVX1-NEXT:    vpshufb %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpunpcklbw {{.*#+}} xmm2 =
>> xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
>> +; AVX1-NEXT:    vpshufb %xmm8, %xmm0, %xmm4
>> +; AVX1-NEXT:    vpshufb %xmm6, %xmm1, %xmm5
>> +; AVX1-NEXT:    vpor %xmm4, %xmm5, %xmm4
>> +; AVX1-NEXT:    vpshufb %xmm7, %xmm1, %xmm1
>> +; AVX1-NEXT:    vpshufb %xmm3, %xmm0, %xmm0
>> +; AVX1-NEXT:    vpor %xmm1, %xmm0, %xmm0
>> +; AVX1-NEXT:    vpunpcklbw {{.*#+}} xmm0 =
>> xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3],xmm0[4],xmm4[4],xmm0[5],xmm4[5],xmm0[6],xmm4[6],xmm0[7],xmm4[7]
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: constant_pblendvb_avx2:
>>
>> Modified: llvm/trunk/test/CodeGen/X86/vselect.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect.ll?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/vselect.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/vselect.ll Thu Feb 19 04:36:19 2015
>> @@ -6,9 +6,8 @@
>>  define <4 x float> @test1(<4 x float> %a, <4 x float> %b) {
>>  ; CHECK-LABEL: test1:
>>  ; CHECK:       # BB#0:
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; CHECK-NEXT:    orps %xmm1, %xmm0
>> +; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[1,3]
>> +; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
>>  ; CHECK-NEXT:    retq
>>    %1 = select <4 x i1> <i1 true, i1 false, i1 true, i1 false>, <4 x
>> float> %a, <4 x float> %b
>>    ret <4 x float> %1
>> @@ -53,9 +52,6 @@ define <4 x float> @test5(<4 x float> %a
>>  define <8 x i16> @test6(<8 x i16> %a, <8 x i16> %b) {
>>  ; CHECK-LABEL: test6:
>>  ; CHECK:       # BB#0:
>> -; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
>> -; CHECK-NEXT:    orps {{.*}}(%rip), %xmm1
>> -; CHECK-NEXT:    andps %xmm1, %xmm0
>>  ; CHECK-NEXT:    retq
>>    %1 = select <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 true,
>> i1 false, i1 true, i1 false>, <8 x i16> %a, <8 x i16> %a
>>    ret <8 x i16> %1
>> @@ -64,9 +60,8 @@ define <8 x i16> @test6(<8 x i16> %a, <8
>>  define <8 x i16> @test7(<8 x i16> %a, <8 x i16> %b) {
>>  ; CHECK-LABEL: test7:
>>  ; CHECK:       # BB#0:
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; CHECK-NEXT:    orps %xmm1, %xmm0
>> +; CHECK-NEXT:    movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
>> +; CHECK-NEXT:    movapd %xmm1, %xmm0
>>  ; CHECK-NEXT:    retq
>>    %1 = select <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 false, i1
>> false, i1 false, i1 false>, <8 x i16> %a, <8 x i16> %b
>>    ret <8 x i16> %1
>> @@ -75,9 +70,7 @@ define <8 x i16> @test7(<8 x i16> %a, <8
>>  define <8 x i16> @test8(<8 x i16> %a, <8 x i16> %b) {
>>  ; CHECK-LABEL: test8:
>>  ; CHECK:       # BB#0:
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm1
>> -; CHECK-NEXT:    andps {{.*}}(%rip), %xmm0
>> -; CHECK-NEXT:    orps %xmm1, %xmm0
>> +; CHECK-NEXT:    movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
>>  ; CHECK-NEXT:    retq
>>    %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 false, i1 true,
>> i1 true, i1 true, i1 true>, <8 x i16> %a, <8 x i16> %b
>>    ret <8 x i16> %1
>> @@ -103,10 +96,10 @@ define <8 x i16> @test10(<8 x i16> %a, <
>>  define <8 x i16> @test11(<8 x i16> %a, <8 x i16> %b) {
>>  ; CHECK-LABEL: test11:
>>  ; CHECK:       # BB#0:
>> -; CHECK-NEXT:    movaps {{.*#+}} xmm2 = <0,65535,65535,0,u,65535,65535,u>
>> -; CHECK-NEXT:    andps %xmm2, %xmm0
>> -; CHECK-NEXT:    andnps %xmm1, %xmm2
>> -; CHECK-NEXT:    orps %xmm2, %xmm0
>> +; CHECK-NEXT:    pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
>> +; CHECK-NEXT:    punpcklwd {{.*#+}} xmm0 =
>> xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
>> +; CHECK-NEXT:    pshufb {{.*#+}} xmm0 =
>> xmm0[2,3,4,5,8,9,14,15,8,9,14,15,12,13,14,15]
>> +; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
>>  ; CHECK-NEXT:    retq
>>    %1 = select <8 x i1> <i1 false, i1 true, i1 true, i1 false, i1 undef,
>> i1 true, i1 true, i1 undef>, <8 x i16> %a, <8 x i16> %b
>>    ret <8 x i16> %1
>>
>> Modified:
>> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll?rev=229835&r1=229834&r2=229835&view=diff
>>
>> ==============================================================================
>> ---
>> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
>> (original)
>> +++
>> llvm/trunk/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll
>> Thu Feb 19 04:36:19 2015
>> @@ -50,8 +50,8 @@ define void @vectorselect(i1 %cond) {
>>    %7 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
>>    %8 = icmp ult i64 %indvars.iv, 8
>>
>> -; A vector select has a cost of 4 on core2
>> -; CHECK: cost of 4 for VF 2 {{.*}}  select i1 %8, i32 %6, i32 0
>> +; A vector select has a cost of 1 on core2
>> +; CHECK: cost of 1 for VF 2 {{.*}}  select i1 %8, i32 %6, i32 0
>>
>>    %sel = select i1 %8, i32 %6, i32 zeroinitializer
>>    store i32 %sel, i32* %7, align 4
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150304/4c449b31/attachment.html>


More information about the llvm-commits mailing list