[llvm] r192866 - x86: Move bitcasts outside concat_vector.

Tue Oct 22 14:54:41 PDT 2013

On Oct 17, 2013, at 10:07 PM, Nadav Rotem <nrotem at apple.com> wrote:

> 
>>>> +  EVT VT = N->getValueType(0);
>>>> +  SDValue Op0 = N->getOperand(0);
>>>> +  SDValue Op1 = N->getOperand(1);
>>>> +  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>>>> +  if (VT.getSizeInBits() == 128 && N->getNumOperands() == 2 &&
>>>> +      Op1->getOpcode() == ISD::UNDEF &&
>>>> +      Op0->getOpcode() == ISD::BITCAST &&
> 
> Why are you limiting this optimization to SSE ? You can easily support AVX and AVX512. 

>>>> +      !TLI.isTypeLegal(Op0->getValueType(0)) &&
>>>> +      TLI.isTypeLegal(Op0->getOperand(0)->getValueType(0))) {
> 
> You also want to check that the vector type <scalar x width> is legal.  For example KNF does not support vectors of i8s. 
> 

Good point.

>>>> +    SDValue Scalar = Op0->getOperand(0);
>>>> +    // Any legal type here will be a simple value type.
>>>> +    MVT SVT = Scalar->getValueType(0).getSimpleVT();
> 
> Bug.   What happens if you bit cast a vector to vector ?  
> 
Excellent catch. We should handle that case, too. There’s some really interesting permutations related to this, actually. I need to think a bit more about the right thing to do. I’ll fix the immediate bug by falling back to the old path in that case for the moment (so we don’t have bogus codegen) and then follow up after with code to handle things more generally.

>>>> +    // As a special case, bail out on MMX values.
>>>> +    if (SVT == MVT::x86mmx)
>>>> +      return SDValue();
>>>> +    EVT NVT = MVT::getVectorVT(SVT, 2);
>>>> +    SDLoc dl = SDLoc(N);
>>>> +    SDValue Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NVT, Scalar);
>>>> +    Res = DAG.getNode(ISD::BITCAST, dl, VT, Res);
>>>> +    return Res;
>>>> +  }
>>>> +
>>>> +  return SDValue();
>>>> +}
>>>> +
> 
> 
> 
> On Oct 17, 2013, at 10:50 AM, Jim Grosbach <grosbach at apple.com> wrote:
> 
>> Possibly? If we were to do that, it would probably be something done as part of legalization directly rather than a combine. I considered briefly jumping directly to that, but the ‘if’ checking all of the conditions required for it to be a good thing quickly got a bit crazy, so I backed off and started with it being target-specific.
> 
> I am not sure I understand why this can’t be target independent. 
> 
>> I’m also a bit nervous playing around with bit casts of vectors in the target independent code, as that’s often used to manipulate the backend into allocating values into the desired register classes, and we don’t want to break that. Global isel would make that concern a non-issue, however.
> 
> Can you give an example ?
> 
It’s things like the ARM64/AArch64 int64x1_t, et. al., that play these sorts of games. Those go to NEON regs, but int64_t goes to GPR64 regs. Some neon intrinsics use that to their advantage to avoid cross-class copies.