[llvm-commits] [PATCH] extend fix to PR12312 to support 256-bit vector
Duncan Sands
baldrick at free.fr
Mon Sep 10 03:39:51 PDT 2012
Hi Michael,
> Fix to PR12312 only supports 128-bit vector. This patch extends the
> support to 256-bit vector as well. The patch includes 2 major changes:
>
> + one change to, when bitcasting 256-bit vector (or wider) to 256-bit
> integer, allow type legalization to extract elements instead of storing
> whole vector followed by loading each element.
> Suppose a N-bit vector into a N-bit integer. Type legalization will
> try cast that N-bit vector into <2 x (N/2)> vector first and extract the
> lo/hi part. When N = 128, <2 x (N/2>) is legal. But, when N =256, <2 x
> i128> is illegal (on x86) and type legalization falls back to store/load
> approach.
> When <2 x (N/2)> is found illegal, this patch tries to check <M x
> ExtVT>, where ExtVT is the type which N-bit integer is extended onto,
> e.g. on x86, 256-bit integer will be extended to 64-bit integers and we
> will try <4 x i64>. If <M x ExtVT> is legal, we will extract elements
> from it and build N/2-bit lo/hi parts.
...
> +
> + // If <NOutVT x 2> is not a legal type, try <ExtOutVT x (2*f)>, where
> + // f = bits(NOutVT)/bits(ExtOutVT).
How about unifying the following code and the code just above (which does
the f == 1 case)? For example, by looping over the two values for NVT,
and if one is legal then executing the code you added, which can handle
both cases (maybe with a little tweaking), right? In fact it can be made
even more general:
Observe that it isn't very relevant that OutVT is illegal with operand Expand.
What matters is that the vector type NVT built from it (<2 x i128>, <4 x i64>
etc) is legal. So rather than applying getTypeToExpandTo/getTypeToTransform to
to OutVT, how about ignoring OutVT and starting from the original vector value
type (<1 x i256>) and systematically double the number of vector elements while
halving the element size, looking for a legal vector type: <2 x i128>,
<4 x i64>, <8 x i32>, <16 x i16>, <32 x i8>. When (if) you find that one of
these is legal, then apply the transform to it.
> + EVT ExtOutVT = TLI.getTypeToExpandTo(*DAG.getContext(), OutVT);
> + unsigned NumElems = NOutVT.getSizeInBits() / ExtOutVT.getSizeInBits();
> +
> + assert(isPowerOf2_32(NumElems) &&
> + "Type expanded to is larger than type transformed to.");
> +
> + NumElems <<= 1;
> + NVT = EVT::getVectorVT(*DAG.getContext(), ExtOutVT, NumElems);
> +
> + if (isTypeLegal(NVT)) {
> + SDValue CastInOp = DAG.getNode(ISD::BITCAST, dl, NVT, InOp);
> +
> + SmallVector<SDValue, 64> Vals;
> + for (unsigned i = 0; i < NumElems; ++i)
> + Vals.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ExtOutVT,
> + CastInOp, DAG.getIntPtrConstant(i)));
> +
> + // Build Lo, Hi pair by pairing extracted elements.
> + unsigned Slot = 0;
> + for (unsigned e = Vals.size(); e - Slot > 2; Slot += 2, e += 1) {
> + // Each iteration will BUILD_PAIR two nodes and append the result until
> + // there are only two nodes left, i.e. Lo and Hi.
> + SDValue LHS = Vals[Slot];
> + SDValue RHS = Vals[Slot + 1];
> + Vals.push_back(DAG.getNode(ISD::BUILD_PAIR, dl,
> + EVT::getIntegerVT(
> + *DAG.getContext(),
> + LHS.getValueType().getSizeInBits() << 1),
> + LHS, RHS));
> + }
> + Lo = Vals[Slot++];
> + Hi = Vals[Slot++];
> +
> + if (TLI.isBigEndian())
> + std::swap(Lo, Hi);
On a big endian machine don't you need to reverse everything you built, not just
the last two?
Ciao, Duncan.
> +
> + return;
> + }
> }
More information about the llvm-commits
mailing list