[llvm] r192866 - x86: Move bitcasts outside concat_vector.

Thu Oct 17 10:50:17 PDT 2013

Possibly? If we were to do that, it would probably be something done as part of legalization directly rather than a combine. I considered briefly jumping directly to that, but the ‘if’ checking all of the conditions required for it to be a good thing quickly got a bit crazy, so I backed off and started with it being target-specific. I’m also a bit nervous playing around with bit casts of vectors in the target independent code, as that’s often used to manipulate the backend into allocating values into the desired register classes, and we don’t want to break that. Global isel would make that concern a non-issue, however.

Filed http://llvm.org/bugs/show_bug.cgi?id=17607.

-Jim

On Oct 17, 2013, at 2:34 AM, Owen Anderson <resistor at mac.com> wrote:

> Could this be generalized into a target-independent transform?
> 
> -Owen
> 
>> On Oct 16, 2013, at 7:58 PM, Jim Grosbach <grosbach at apple.com> wrote:
>> 
>> Author: grosbach
>> Date: Wed Oct 16 21:58:06 2013
>> New Revision: 192866
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=192866&view=rev
>> Log:
>> x86: Move bitcasts outside concat_vector.
>> 
>> Consider the following:
>> 
>> typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
>> aligned(2)));
>> typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
>> typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
>> typedef int int4 __attribute__((ext_vector_type(4)));
>> 
>> int4 __bbase_cvt_int(ushort4 v) {
>> ushort8 a;
>> a.lo = v;
>> return _mm_cvtepu16_epi32(a);
>> }
>> 
>> This generates the, not unreasonable, IR:
>> define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
>> %tmp = bitcast double %v.coerce to <4 x i16>
>> %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
>> %0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
>> %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
>> ret <4 x i32> %tmp2
>> }
>> 
>> The problem is when type legalization gets hold of the v4i16. It
>> legalizes that by spilling to the stack, then doing a zero-extending
>> load. Things go even more silly from there, ending up with something
>> like:
>> _foo0:
>> movsd %xmm0, -8(%rsp)       <== Spill to the stack.
>> movq  -8(%rsp), %xmm0       <== Reload it right back out.
>> pmovzxwd  %xmm0, %xmm1      <== Here's what we actually asked for.
>> pblendw $1, %xmm1, %xmm0    <== We don't need this at all
>> pmovzxwd  %xmm0, %xmm0      <== We already did this
>> ret
>> 
>> The v8i8 to v8i16 zext intrinsic gives even worse results, with two
>> table lookups via pshufb instructions(!!).
>> 
>> To avoid all that, we can move the bitcasting until after we've formed
>> the wider (legal) vector type. Then our normal codegen flows along
>> nicely and we get the expected:
>> _foo0:
>> pmovzxwd  %xmm0, %xmm0
>> ret
>> 
>> rdar://15245794
>> 
>> Modified:
>>   llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>   llvm/trunk/test/CodeGen/X86/pmovext.ll
>> 
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=192866&r1=192865&r2=192866&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Oct 16 21:58:06 2013
>> @@ -1498,6 +1498,7 @@ void X86TargetLowering::resetOperationAc
>>  }
>> 
>>  // We have target-specific dag combine patterns for the following nodes:
>> +  setTargetDAGCombine(ISD::CONCAT_VECTORS);
>>  setTargetDAGCombine(ISD::VECTOR_SHUFFLE);
>>  setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
>>  setTargetDAGCombine(ISD::VSELECT);
>> @@ -16151,6 +16152,44 @@ static SDValue PerformShuffleCombine256(
>>  return SDValue();
>> }
>> 
>> +static SDValue PerformConcatCombine(SDNode *N, SelectionDAG &DAG,
>> +                                    TargetLowering::DAGCombinerInfo &DCI,
>> +                                    const X86Subtarget *Subtarget) {
>> +  // Creating a v8i16 from a v4i16 argument and an undef runs into trouble in
>> +  // type legalization and ends up spilling to the stack. Avoid that by
>> +  // creating a vector first and bitcasting the result rather than
>> +  // bitcasting the source then creating the vector. Similar problems with
>> +  // v8i8.
>> +
>> +  // No point in doing this after legalize, so early exit for that.
>> +  if (!DCI.isBeforeLegalize())
>> +    return SDValue();
>> +
>> +  EVT VT = N->getValueType(0);
>> +  SDValue Op0 = N->getOperand(0);
>> +  SDValue Op1 = N->getOperand(1);
>> +  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>> +  if (VT.getSizeInBits() == 128 && N->getNumOperands() == 2 &&
>> +      Op1->getOpcode() == ISD::UNDEF &&
>> +      Op0->getOpcode() == ISD::BITCAST &&
>> +      !TLI.isTypeLegal(Op0->getValueType(0)) &&
>> +      TLI.isTypeLegal(Op0->getOperand(0)->getValueType(0))) {
>> +    SDValue Scalar = Op0->getOperand(0);
>> +    // Any legal type here will be a simple value type.
>> +    MVT SVT = Scalar->getValueType(0).getSimpleVT();
>> +    // As a special case, bail out on MMX values.
>> +    if (SVT == MVT::x86mmx)
>> +      return SDValue();
>> +    EVT NVT = MVT::getVectorVT(SVT, 2);
>> +    SDLoc dl = SDLoc(N);
>> +    SDValue Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NVT, Scalar);
>> +    Res = DAG.getNode(ISD::BITCAST, dl, VT, Res);
>> +    return Res;
>> +  }
>> +
>> +  return SDValue();
>> +}
>> +
>> /// PerformShuffleCombine - Performs several different shuffle combines.
>> static SDValue PerformShuffleCombine(SDNode *N, SelectionDAG &DAG,
>>                                     TargetLowering::DAGCombinerInfo &DCI,
>> @@ -19029,6 +19068,7 @@ SDValue X86TargetLowering::PerformDAGCom
>>  case X86ISD::VPERMILP:
>>  case X86ISD::VPERM2X128:
>>  case ISD::VECTOR_SHUFFLE: return PerformShuffleCombine(N, DAG, DCI,Subtarget);
>> +  case ISD::CONCAT_VECTORS: return PerformConcatCombine(N, DAG, DCI, Subtarget);
>>  case ISD::FMA:            return PerformFMACombine(N, DAG, Subtarget);
>>  }
>> 
>> 
>> Modified: llvm/trunk/test/CodeGen/X86/pmovext.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pmovext.ll?rev=192866&r1=192865&r2=192866&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/pmovext.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/pmovext.ll Wed Oct 16 21:58:06 2013
>> @@ -18,5 +18,28 @@ define void @intrin_pmov(i16* noalias %d
>> }
>> 
>> declare <8 x i16> @llvm.x86.sse41.pmovzxbw(<16 x i8>) nounwind readnone
>> -
>> declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind
>> +
>> +; rdar://15245794
>> +
>> +define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
>> +; CHECK-LABEL: foo0
>> +; CHECK: pmovzxwd %xmm0, %xmm0
>> +; CHECK-NEXT: ret
>> +  %tmp = bitcast double %v.coerce to <4 x i16>
>> +  %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
>> +  %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1) nounwind
>> +  ret <4 x i32> %tmp2
>> +}
>> +
>> +define <8 x i16> @foo1(double %v.coerce) nounwind ssp {
>> +; CHECK-LABEL: foo1
>> +; CHECK: pmovzxbw %xmm0, %xmm0
>> +; CHECK-NEXT: ret
>> +  %tmp = bitcast double %v.coerce to <8 x i8>
>> +  %tmp1 = shufflevector <8 x i8> %tmp, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> +  %tmp2 = tail call <8 x i16> @llvm.x86.sse41.pmovzxbw(<16 x i8> %tmp1)
>> +  ret <8 x i16> %tmp2
>> +}
>> +
>> +declare <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16>) nounwind readnone
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131017/e3bf1e61/attachment.html>