[PATCH] [AArch64] Fix a bug generating incorrect instruction when building small vector.
Tim Northover
t.p.northover at gmail.com
Mon Jul 21 03:34:42 PDT 2014
Hi Kevin,
Sorry this took so long to review. I kept putting it off because the code looked nasty for what it was trying to achieve. It actually turned out much better than I was expecting though (I have comments, but they're superficial rather than horrified).
Cheers.
Tim.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4167-4172
@@ +4166,8 @@
+ if (SrcEltTy.getSizeInBits() < SmallestEltTy.getSizeInBits()) {
+ // It may hit here if trying to build a small vector which is less
+ // than 64 bit. For example, extracting low part from v8i8 to build v4i8.
+ // Because v4i8 is illegal, it will be promoted to v4i16.
+ // For this example, we need to create a v8i8 shuffle_vector which only
+ // lane 0, 2, 4, 6 are valid. Also, it should be bitcasted back to
+ // original v4i8 at last.
+ ResMultiplier = SmallestEltTy.getSizeInBits() / SrcEltTy.getSizeInBits();
----------------
I think this is putting too much emphasis on the v4i8 origin of the DAG rather than what's actually in front of us: a v4i16 BUILD_VECTOR made up of a set of v8i8 extractions (and even that's too much of a special case).
This loop also looked more intimidating than it needed to because of this big block comment inside. If the purpose is explained completely outside, the entire thing becomes a fairly trivial 3-4 line loop.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4173
@@ +4172,3 @@
+ // original v4i8 at last.
+ ResMultiplier = SmallestEltTy.getSizeInBits() / SrcEltTy.getSizeInBits();
+ SmallestEltTy = SrcEltTy;
----------------
I don't think this is correct. Imagine an output vector of v4i32, produced from SourceVecs with type v8i16 and v16i8. This situation should give ResMultiplier = 4, but it would end up as 2.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4189-4192
@@ -4169,4 +4188,6 @@
+ ShuffleVT.getVectorElementType()) {
// It may hit this case if SourceVecs[i] is AssertSext/AssertZext.
// Then bitcast it to the vector which holds asserted element type,
// and record the multiplier of element width between SourceVecs and
// Build_vector which is needed to extract the correct lanes later.
+ EVT CastVT = EVT::getVectorVT(
----------------
This comment seems stale. While you're in the area could you rephrase it to something more accurate?
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4269-4270
@@ -4248,2 +4268,4 @@
}
+ for (int j = 0; j != ResMultiplier - 1; ++j)
+ Mask.push_back(-1);
}
----------------
I don't think this is generic enough. Again in the situation with two mismatched sources (so take a v4i32 VT being produced from a v4i16 and a v8i8 for example).
Anything coming from the v4i16 needs to set *two* elements of Mask properly (and then 2 undefs).
I'm not sure, but a more thorough restructuring of this loop might make the result clearer too. Don't be afraid to move things about if you see anything.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4277-4279
@@ +4276,5 @@
+ ShuffleSrcs[1], &Mask[0]);
+ if (ShuffleVT != VT)
+ Shuffle = DAG.getNode(ISD::BITCAST, dl, VT, Shuffle);
+ return Shuffle;
+ }
----------------
LLVM will omit the bitcast if it's not needed. It's simpler to just "return DAG.getNode(ISD::BITCAST, ...);".
http://reviews.llvm.org/D4385
More information about the llvm-commits
mailing list