[llvm-commits] [PATCH] extend fix to PR12312 to support 256-bit vector

Mon Sep 10 03:39:51 PDT 2012

Hi Michael,

> Fix to PR12312 only supports 128-bit vector. This patch extends the
> support to 256-bit vector as well. The patch includes 2 major changes:
>
> + one change to, when bitcasting 256-bit vector (or wider) to 256-bit
> integer, allow type legalization to extract elements instead of storing
> whole vector followed by loading each element.
>    Suppose a N-bit vector into a N-bit integer. Type legalization will
> try cast that N-bit vector into <2 x (N/2)> vector first and extract the
> lo/hi part. When N = 128, <2 x (N/2>) is legal. But, when N =256, <2 x
> i128> is illegal (on x86) and type legalization falls back to store/load
> approach.
>    When <2 x (N/2)> is found illegal, this patch tries to check <M x
> ExtVT>, where ExtVT is the type which N-bit integer is extended onto,
> e.g. on x86, 256-bit integer will be extended to 64-bit integers and we
> will try <4 x i64>. If <M x ExtVT> is legal, we will extract elements
> from it and build N/2-bit lo/hi parts.

...

> +
> +    // If <NOutVT x 2> is not a legal type, try <ExtOutVT x (2*f)>, where
> +    // f = bits(NOutVT)/bits(ExtOutVT).

How about unifying the following code and the code just above (which does
the f == 1 case)?  For example, by looping over the two values for NVT,
and if one is legal then executing the code you added, which can handle
both cases (maybe with a little tweaking), right?  In fact it can be made
even more general:

Observe that it isn't very relevant that OutVT is illegal with operand Expand.
What matters is that the vector type NVT built from it (<2 x i128>, <4 x i64>
etc) is legal.  So rather than applying getTypeToExpandTo/getTypeToTransform to
to OutVT, how about ignoring OutVT and starting from the original vector value
type (<1 x i256>) and systematically double the number of vector elements while
halving the element size, looking for a legal vector type: <2 x i128>,
<4 x i64>, <8 x i32>, <16 x i16>, <32 x i8>.  When (if) you find that one of
these is legal, then apply the transform to it.

> +    EVT ExtOutVT = TLI.getTypeToExpandTo(*DAG.getContext(), OutVT);
> +    unsigned NumElems = NOutVT.getSizeInBits() / ExtOutVT.getSizeInBits();
> +
> +    assert(isPowerOf2_32(NumElems) &&
> +           "Type expanded to is larger than type transformed to.");
> +
> +    NumElems <<= 1;
> +    NVT = EVT::getVectorVT(*DAG.getContext(), ExtOutVT, NumElems);
> +
> +    if (isTypeLegal(NVT)) {
> +      SDValue CastInOp = DAG.getNode(ISD::BITCAST, dl, NVT, InOp);
> +
> +      SmallVector<SDValue, 64> Vals;
> +      for (unsigned i = 0; i < NumElems; ++i)
> +        Vals.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ExtOutVT,
> +                                   CastInOp, DAG.getIntPtrConstant(i)));
> +
> +      // Build Lo, Hi pair by pairing extracted elements.
> +      unsigned Slot = 0;
> +      for (unsigned e = Vals.size(); e - Slot > 2; Slot += 2, e += 1) {
> +        // Each iteration will BUILD_PAIR two nodes and append the result until
> +        // there are only two nodes left, i.e. Lo and Hi.
> +        SDValue LHS = Vals[Slot];
> +        SDValue RHS = Vals[Slot + 1];
> +        Vals.push_back(DAG.getNode(ISD::BUILD_PAIR, dl,
> +                                   EVT::getIntegerVT(
> +                                     *DAG.getContext(),
> +                                     LHS.getValueType().getSizeInBits() << 1),
> +                                   LHS, RHS));
> +      }
> +      Lo = Vals[Slot++];
> +      Hi = Vals[Slot++];
> +
> +      if (TLI.isBigEndian())
> +        std::swap(Lo, Hi);

On a big endian machine don't you need to reverse everything you built, not just
the last two?

Ciao, Duncan.

> +
> +      return;
> +    }
>    }