[PATCH] [AArch64] Fix a bug generating incorrect instruction when building small vector.

Mon Jul 21 03:34:42 PDT 2014

Hi Kevin,

Sorry this took so long to review. I kept putting it off because the code looked nasty for what it was trying to achieve. It actually turned out much better than I was expecting though (I have comments, but they're superficial rather than horrified).

Cheers.

Tim.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4167-4172
@@ +4166,8 @@
+    if (SrcEltTy.getSizeInBits() < SmallestEltTy.getSizeInBits()) {
+      // It may hit here if trying to build a small vector which is less
+      // than 64 bit. For example, extracting low part from v8i8 to build v4i8.
+      // Because v4i8 is illegal, it will be promoted to v4i16.
+      // For this example, we need to create a v8i8 shuffle_vector which only
+      // lane 0, 2, 4, 6 are valid. Also, it should be bitcasted back to
+      // original v4i8 at last.
+      ResMultiplier = SmallestEltTy.getSizeInBits() / SrcEltTy.getSizeInBits();
----------------
I think this is putting too much emphasis on the v4i8 origin of the DAG rather than what's actually in front of us: a v4i16 BUILD_VECTOR made up of a set of v8i8 extractions (and even that's too much of a special case).

This loop also looked more intimidating than it needed to because of this big block comment inside. If the purpose is explained completely outside, the entire thing becomes a fairly trivial 3-4 line loop.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4173
@@ +4172,3 @@
+      // original v4i8 at last.
+      ResMultiplier = SmallestEltTy.getSizeInBits() / SrcEltTy.getSizeInBits();
+      SmallestEltTy = SrcEltTy;
----------------
I don't think this is correct. Imagine an output vector of v4i32, produced from SourceVecs with type v8i16 and v16i8. This situation should give ResMultiplier = 4, but it would end up as 2.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4189-4192
@@ -4169,4 +4188,6 @@
+        ShuffleVT.getVectorElementType()) {
       // It may hit this case if SourceVecs[i] is AssertSext/AssertZext.
       // Then bitcast it to the vector which holds asserted element type,
       // and record the multiplier of element width between SourceVecs and
       // Build_vector which is needed to extract the correct lanes later.
+      EVT CastVT = EVT::getVectorVT(
----------------
This comment seems stale. While you're in the area could you rephrase it to something more accurate?

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4269-4270
@@ -4248,2 +4268,4 @@
     }
+    for (int j = 0; j != ResMultiplier - 1; ++j)
+      Mask.push_back(-1);
   }
----------------
I don't think this is generic enough. Again in the situation with two mismatched sources (so take a v4i32 VT being produced from a v4i16 and a v8i8 for example).

Anything coming from the v4i16 needs to set *two* elements of Mask properly (and then 2 undefs).

I'm not sure, but a more thorough restructuring of this loop might make the result clearer too. Don't be afraid to move things about if you see anything.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4277-4279
@@ +4276,5 @@
+                                           ShuffleSrcs[1], &Mask[0]);
+    if (ShuffleVT != VT)
+      Shuffle = DAG.getNode(ISD::BITCAST, dl, VT, Shuffle);
+    return Shuffle;
+  }
----------------
LLVM will omit the bitcast if it's not needed. It's simpler to just "return DAG.getNode(ISD::BITCAST, ...);".

http://reviews.llvm.org/D4385