[PATCH] [AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
Tim Northover
t.p.northover at gmail.com
Tue Jun 17 00:19:01 PDT 2014
Hi Kevin,
>> Isn't the key type here VT.getVectorElementType()? I'm not sure
>> I see the logic of caring about any AssertSext/AssertZext input:
>> we're just going to be discarding the bits anyway in your example.
>>
>> In the reverse case (say extracting from v4i16 and inserting into
>> v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically
>> be doing an anyext, so we don't have to care there either.
>>
>> Or do you have examples where this kind of check is needed?
>
> If input cannot pass element type check, we can simply get function
> return and the incorrect concat_vector would not be generated.
Fair enough, *something* clearly has to be done; I'm not disputing that.
> But
> from the result, I saw lots of MOV and INS instructions are generated
> for this build_vector. To get better result, I added additional codes here
> specically for AssertSext /AssertZext, wishing to generate single UZIP1
> instead. For AssertSext /AssertZext, only low bits of each element are
> valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast
> to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are
> undefined.
I think that works regardless of whether the AssertSext is present or
not. All the AssertZext and AssertSext nodes tell us is that the lanes
we *don't* care about are going to be either 0 or -1 (and that's
assuming they match up with the vector we're building, they're even
less useful otherwise).
>> Instead of mangling the actual BUILD_VECTOR node, perhaps we should
>> create a similar OffsetMultipliers (say) variable and just record what we've
>> done for later in the function.
>
> You mean rename Pow to OffsetMultipliers? I will do that in next version.
I mean something a bit bigger: completely remove the
"DAG.getNode(ISD::BUILD_VECTOR, ...)" code and instead save the
information needed to extract the correct lanes later, in an "int
OffsetMultipliers[2] = { 1, 1 };" variable.
Purely for discussion purposes, I've attached a quick hatchet-job I've
done on the patch along the lines I'm suggesting (clearly unpolished).
It gets the test you wrote correct, without so many special cases. I
assume you had something similar before you settled on using
AssertZext and AssertSext, do you have an example of why it's not the
right solution?
Cheers.
Tim.
-------------- next part --------------
commit 36a4a05514bd35a787a50939420d0e9c74206cd1
Author: Tim Northover <T.P.Northover at gmail.com>
Date: Tue Jun 17 07:57:48 2014 +0100
TMP
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7a2c9c9..3d9cf46 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4110,6 +4110,7 @@ static SDValue NarrowVector(SDValue V128Reg, SelectionDAG &DAG) {
// shuffle in combination with VEXTs.
SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
SelectionDAG &DAG) const {
+ assert(Op.getOpcode() == ISD::BUILD_VECTOR && "Unknown opcode!");
SDLoc dl(Op);
EVT VT = Op.getValueType();
unsigned NumElts = VT.getVectorNumElements();
@@ -4158,35 +4159,43 @@ SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
SDValue ShuffleSrcs[2] = { DAG.getUNDEF(VT), DAG.getUNDEF(VT) };
int VEXTOffsets[2] = { 0, 0 };
+ int OffsetMultipliers[2] = { 1, 1 };
// This loop extracts the usage patterns of the source vectors
// and prepares appropriate SDValues for a shuffle if possible.
for (unsigned i = 0; i < SourceVecs.size(); ++i) {
- if (SourceVecs[i].getValueType() == VT) {
+ unsigned NumSrcElts = SourceVecs[i].getValueType().getVectorNumElements();
+ SDValue CurSource = SourceVecs[i];
+ if (SourceVecs[i].getValueType().getVectorElementType() !=
+ VT.getVectorElementType()) {
+ EVT CastVT =
+ EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
+ SourceVecs[i].getValueSizeInBits() /
+ VT.getVectorElementType().getSizeInBits());
+
+ CurSource = DAG.getNode(ISD::BITCAST, dl, CastVT, SourceVecs[i]);
+ OffsetMultipliers[i] = CastVT.getVectorNumElements() / NumSrcElts;
+ NumSrcElts *= OffsetMultipliers[i];
+ MaxElts[i] *= OffsetMultipliers[i];
+ MinElts[i] *= OffsetMultipliers[i];
+ }
+
+ if (CurSource.getValueType() == VT) {
// No VEXT necessary
- ShuffleSrcs[i] = SourceVecs[i];
+ ShuffleSrcs[i] = CurSource;
VEXTOffsets[i] = 0;
continue;
- } else if (SourceVecs[i].getValueType().getVectorNumElements() < NumElts) {
+ } else if (NumSrcElts < NumElts) {
// We can pad out the smaller vector for free, so if it's part of a
// shuffle...
- ShuffleSrcs[i] = DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, SourceVecs[i],
- DAG.getUNDEF(SourceVecs[i].getValueType()));
+ ShuffleSrcs[i] = DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, CurSource,
+ DAG.getUNDEF(CurSource.getValueType()));
continue;
}
- // Don't attempt to extract subvectors from BUILD_VECTOR sources
- // that expand or trunc the original value.
- // TODO: We can try to bitcast and ANY_EXTEND the result but
- // we need to consider the cost of vector ANY_EXTEND, and the
- // legality of all the types.
- if (SourceVecs[i].getValueType().getVectorElementType() !=
- VT.getVectorElementType())
- return SDValue();
-
// Since only 64-bit and 128-bit vectors are legal on ARM and
// we've eliminated the other cases...
- assert(SourceVecs[i].getValueType().getVectorNumElements() == 2 * NumElts &&
+ assert(NumSrcElts == 2 * NumElts &&
"unexpected vector sizes in ReconstructShuffle");
if (MaxElts[i] - MinElts[i] >= NumElts) {
@@ -4197,21 +4206,20 @@ SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
if (MinElts[i] >= NumElts) {
// The extraction can just take the second half
VEXTOffsets[i] = NumElts;
- ShuffleSrcs[i] =
- DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i],
- DAG.getIntPtrConstant(NumElts));
+ ShuffleSrcs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, CurSource,
+ DAG.getIntPtrConstant(NumElts));
} else if (MaxElts[i] < NumElts) {
// The extraction can just take the first half
VEXTOffsets[i] = 0;
ShuffleSrcs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT,
- SourceVecs[i], DAG.getIntPtrConstant(0));
+ CurSource, DAG.getIntPtrConstant(0));
} else {
// An actual VEXT is needed
VEXTOffsets[i] = MinElts[i];
SDValue VEXTSrc1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT,
- SourceVecs[i], DAG.getIntPtrConstant(0));
+ CurSource, DAG.getIntPtrConstant(0));
SDValue VEXTSrc2 =
- DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i],
+ DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, CurSource,
DAG.getIntPtrConstant(NumElts));
unsigned Imm = VEXTOffsets[i] * getExtFactor(VEXTSrc1);
ShuffleSrcs[i] = DAG.getNode(AArch64ISD::EXT, dl, VT, VEXTSrc1, VEXTSrc2,
@@ -4232,9 +4240,9 @@ SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
int ExtractElt =
cast<ConstantSDNode>(Op.getOperand(i).getOperand(1))->getSExtValue();
if (ExtractVec == SourceVecs[0]) {
- Mask.push_back(ExtractElt - VEXTOffsets[0]);
+ Mask.push_back(ExtractElt * OffsetMultipliers[0] - VEXTOffsets[0]);
} else {
- Mask.push_back(ExtractElt + NumElts - VEXTOffsets[1]);
+ Mask.push_back(ExtractElt * OffsetMultipliers[1] + NumElts - VEXTOffsets[1]);
}
}
More information about the llvm-commits
mailing list