[PATCH] Implement aarch64 neon instruction class SIMD copy - LLVM

Tue Oct 8 06:05:33 PDT 2013

  Hi Kevin,

  Overall looks good, but I have quite a few comments on individual areas.

  Cheers,

  James


================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:75
@@ -74,2 +74,3 @@
     addRegisterClass(MVT::v2i64, &AArch64::FPR128RegClass);
+    addRegisterClass(MVT::v8f16, &AArch64::FPR128RegClass);
     addRegisterClass(MVT::v4f32, &AArch64::FPR128RegClass);
----------------
This patch doesn't implement half-float support, so this line seems out of place.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:309
@@ -304,1 +308,3 @@
+    setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i64, Custom);
+    setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v8f16, Custom);
     setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f32, Custom);
----------------
Again, as the patch avoids full half-float support, this line seems out of place.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:322
@@ -312,1 +321,3 @@
+    setOperationAction(ISD::CONCAT_VECTORS, MVT::v2i64, Legal);
+    setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f16, Legal);
     setOperationAction(ISD::CONCAT_VECTORS, MVT::v4f32, Legal);
----------------
Half-float, out of place.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3505
@@ +3504,3 @@
+
+  unsigned NumElts = VT.getVectorNumElements();
+  bool isOnlyLowElement = true;
----------------
I believe this code is copied from the ARM (32-bit) backend. It's quite substantial, and it would be nice if it could be factored away somewhere in a helper function and reused by both backends.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3640
@@ -3517,2 +3639,3 @@
       if (Lane == -1) Lane = 0;
-
+      // Test if V1 is a BUILD_VECTOR
+      if (Lane == 0 && V1.getOpcode() == ISD::BUILD_VECTOR &&
----------------
This comment doesn't help understand the code below. The code actually checks if this is a BUILD_VECTOR with a non-constant operand in index 0, and the splat lane is 0.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3642
@@ +3641,3 @@
+      if (Lane == 0 && V1.getOpcode() == ISD::BUILD_VECTOR &&
+          !isa<ConstantSDNode>(V1.getOperand(0))) {
+        bool IsScalarToVector = true;
----------------
This condition seems wrong to me. The code *looks* like it produces a VDUP of a constant to all lanes (as opposed to the VDUPLANE below).

Why does this only work when the operand is *not* a constant? I would have thought that the condition should be negated - && isa<ConstantSDNode>(V1.getOperand(0));

Also, why does it only work when the splat lane is 0? can it not be: isa<ConstantSDNode>(V1.getOperand(Lane))?

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3664
@@ +3663,3 @@
+    // output.
+    //Collect elements need to be inserted and their index.
+    SmallVector<int, 8> NV1Elt;
----------------
Pedantic: Space between // and C

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3686
@@ +3685,3 @@
+    bool IsV2Inserted = true;
+    if (Length - NV1Elt.size() < 1)
+      IsV1Inserted = false;
----------------
This is a weird-looking condition - the canonical condition would be "NV1Elt.size() == Length", would it not?

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3691
@@ +3690,3 @@
+
+    //Decide which to be inserted.
+    SDValue InsV = V1;
----------------
Pedantic: // space D.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3695
@@ +3694,3 @@
+    SmallVector<int, 8> InsIndex = N1Index;
+    if (IsV1Inserted || IsV2Inserted) {
+      if (NV1Elt.size() > NV2Elt.size()) {
----------------
As these variables are only used once, I suggest removing them and just using .size() in the if condition:

if (NV1Elt.size() != Length || NV2Elt.size() != Length)

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3702
@@ +3701,3 @@
+    }
+    else
+      InsV = DAG.getNode(ISD::UNDEF, dl, VT);
----------------
Coding style: if you're using braces around the if(), then you should use braces around the else().

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3707
@@ +3706,3 @@
+    int V1EltNum = V1.getValueType().getVectorNumElements();
+    for (int InsertNum = 0, Index = (NV1Elt.size() > NV2Elt.size()) ?
+        NV2Elt.size() : NV1Elt.size();
----------------
Several changes here:
  * I'd use "I" instead of "InsertNum" (nitpick, feel free to ignore)
  * I'd use "E" instead of "Index" (or at least something that is less generic than Index - Index doesn't feel like the name for an "end" variable)
  * Index = InsArray.size()

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3713
@@ +3712,3 @@
+        ExtV = V2;
+        InsArray[InsertNum] -= V1EltNum;
+      }
----------------
Instead of mutating InsArray in-place, I'd create a variable and change that. Something like:

int Mask = InsArray[InsertNum];
if (Mask > V1EltNum) {
  ExtV = V2;
  Mask -= V1EltNum);
}

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3698
@@ +3697,3 @@
+        InsV = V2;
+        InsArray = NV2Elt;
+        InsIndex = N2Index;
----------------
Would it be better to call this "InsMasks" or something instead of "InsArray"? That way we know that one array contains the masks, the other contains the indices.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3715
@@ +3714,3 @@
+      }
+      EVT EltVT = MVT::i32;
+      if(EltSize == 64)
----------------
Please add a comment as to why this is necessary! Won't this get legalized anyway?

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:2488
@@ -2488,3 +2487,3 @@
                     (v8i8 (trunc (v8i16 (srl (v8i16 node:$Rn),
-                                             (v8i16 (Neon_dupImm 8))))))>;
+                                             (v8i16 (Neon_vdup (i32 8)))))))>;
   def _4s : PatFrag<(ops node:$Rn),
----------------
Can you say why these (i32 X) are needed?

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3350
@@ -3350,2 +3349,3 @@
 def : Pat<(v8i16 (bitconvert (v16i8  VPR128:$src))), (v8i16 VPR128:$src)>;
+def : Pat<(v8f16 (bitconvert (v16i8  VPR128:$src))), (v8f16 VPR128:$src)>;
 
----------------
I thought this patch wasn't meant to add half-float support?

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4512
@@ -4456,7 +4511,3 @@
 
-defm SMOVxb_pattern : Neon_SMOVx_pattern<v16i8, v8i8, i8, neon_uimm4_bare,
-                                          neon_uimm3_bare, SMOVxb>;
-defm SMOVxh_pattern : Neon_SMOVx_pattern<v8i16, v4i16, i16, neon_uimm3_bare,
-                                          neon_uimm2_bare, SMOVxh>;
-defm SMOVxs_pattern : Neon_SMOVx_pattern<v4i32, v2i32, i32, neon_uimm2_bare,
-                                          neon_uimm1_bare, SMOVxs>;
+defm : Neon_SMOVx_pattern<v16i8, v8i8, i8, neon_uimm4_bare,
+                          neon_uimm3_bare, SMOVxb>;
----------------
These pattern match SMOV nodes - would it not be better to pattern match (sext (COPY ) ) instead? That way the copy coalescer could be more effective. Same for UMOV nodes (zext (COPY ) )


http://llvm-reviews.chandlerc.com/D1854