[PATCH] Implement aarch64 neon instruction class SIMD copy - LLVM
James Molloy
james.molloy at arm.com
Tue Oct 8 06:05:33 PDT 2013
Hi Kevin,
Overall looks good, but I have quite a few comments on individual areas.
Cheers,
James
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:75
@@ -74,2 +74,3 @@
addRegisterClass(MVT::v2i64, &AArch64::FPR128RegClass);
+ addRegisterClass(MVT::v8f16, &AArch64::FPR128RegClass);
addRegisterClass(MVT::v4f32, &AArch64::FPR128RegClass);
----------------
This patch doesn't implement half-float support, so this line seems out of place.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:309
@@ -304,1 +308,3 @@
+ setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i64, Custom);
+ setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v8f16, Custom);
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f32, Custom);
----------------
Again, as the patch avoids full half-float support, this line seems out of place.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:322
@@ -312,1 +321,3 @@
+ setOperationAction(ISD::CONCAT_VECTORS, MVT::v2i64, Legal);
+ setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f16, Legal);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v4f32, Legal);
----------------
Half-float, out of place.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3505
@@ +3504,3 @@
+
+ unsigned NumElts = VT.getVectorNumElements();
+ bool isOnlyLowElement = true;
----------------
I believe this code is copied from the ARM (32-bit) backend. It's quite substantial, and it would be nice if it could be factored away somewhere in a helper function and reused by both backends.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3640
@@ -3517,2 +3639,3 @@
if (Lane == -1) Lane = 0;
-
+ // Test if V1 is a BUILD_VECTOR
+ if (Lane == 0 && V1.getOpcode() == ISD::BUILD_VECTOR &&
----------------
This comment doesn't help understand the code below. The code actually checks if this is a BUILD_VECTOR with a non-constant operand in index 0, and the splat lane is 0.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3642
@@ +3641,3 @@
+ if (Lane == 0 && V1.getOpcode() == ISD::BUILD_VECTOR &&
+ !isa<ConstantSDNode>(V1.getOperand(0))) {
+ bool IsScalarToVector = true;
----------------
This condition seems wrong to me. The code *looks* like it produces a VDUP of a constant to all lanes (as opposed to the VDUPLANE below).
Why does this only work when the operand is *not* a constant? I would have thought that the condition should be negated - && isa<ConstantSDNode>(V1.getOperand(0));
Also, why does it only work when the splat lane is 0? can it not be: isa<ConstantSDNode>(V1.getOperand(Lane))?
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3664
@@ +3663,3 @@
+ // output.
+ //Collect elements need to be inserted and their index.
+ SmallVector<int, 8> NV1Elt;
----------------
Pedantic: Space between // and C
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3686
@@ +3685,3 @@
+ bool IsV2Inserted = true;
+ if (Length - NV1Elt.size() < 1)
+ IsV1Inserted = false;
----------------
This is a weird-looking condition - the canonical condition would be "NV1Elt.size() == Length", would it not?
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3691
@@ +3690,3 @@
+
+ //Decide which to be inserted.
+ SDValue InsV = V1;
----------------
Pedantic: // space D.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3695
@@ +3694,3 @@
+ SmallVector<int, 8> InsIndex = N1Index;
+ if (IsV1Inserted || IsV2Inserted) {
+ if (NV1Elt.size() > NV2Elt.size()) {
----------------
As these variables are only used once, I suggest removing them and just using .size() in the if condition:
if (NV1Elt.size() != Length || NV2Elt.size() != Length)
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3702
@@ +3701,3 @@
+ }
+ else
+ InsV = DAG.getNode(ISD::UNDEF, dl, VT);
----------------
Coding style: if you're using braces around the if(), then you should use braces around the else().
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3707
@@ +3706,3 @@
+ int V1EltNum = V1.getValueType().getVectorNumElements();
+ for (int InsertNum = 0, Index = (NV1Elt.size() > NV2Elt.size()) ?
+ NV2Elt.size() : NV1Elt.size();
----------------
Several changes here:
* I'd use "I" instead of "InsertNum" (nitpick, feel free to ignore)
* I'd use "E" instead of "Index" (or at least something that is less generic than Index - Index doesn't feel like the name for an "end" variable)
* Index = InsArray.size()
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3713
@@ +3712,3 @@
+ ExtV = V2;
+ InsArray[InsertNum] -= V1EltNum;
+ }
----------------
Instead of mutating InsArray in-place, I'd create a variable and change that. Something like:
int Mask = InsArray[InsertNum];
if (Mask > V1EltNum) {
ExtV = V2;
Mask -= V1EltNum);
}
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3698
@@ +3697,3 @@
+ InsV = V2;
+ InsArray = NV2Elt;
+ InsIndex = N2Index;
----------------
Would it be better to call this "InsMasks" or something instead of "InsArray"? That way we know that one array contains the masks, the other contains the indices.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:3715
@@ +3714,3 @@
+ }
+ EVT EltVT = MVT::i32;
+ if(EltSize == 64)
----------------
Please add a comment as to why this is necessary! Won't this get legalized anyway?
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:2488
@@ -2488,3 +2487,3 @@
(v8i8 (trunc (v8i16 (srl (v8i16 node:$Rn),
- (v8i16 (Neon_dupImm 8))))))>;
+ (v8i16 (Neon_vdup (i32 8)))))))>;
def _4s : PatFrag<(ops node:$Rn),
----------------
Can you say why these (i32 X) are needed?
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3350
@@ -3350,2 +3349,3 @@
def : Pat<(v8i16 (bitconvert (v16i8 VPR128:$src))), (v8i16 VPR128:$src)>;
+def : Pat<(v8f16 (bitconvert (v16i8 VPR128:$src))), (v8f16 VPR128:$src)>;
----------------
I thought this patch wasn't meant to add half-float support?
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4512
@@ -4456,7 +4511,3 @@
-defm SMOVxb_pattern : Neon_SMOVx_pattern<v16i8, v8i8, i8, neon_uimm4_bare,
- neon_uimm3_bare, SMOVxb>;
-defm SMOVxh_pattern : Neon_SMOVx_pattern<v8i16, v4i16, i16, neon_uimm3_bare,
- neon_uimm2_bare, SMOVxh>;
-defm SMOVxs_pattern : Neon_SMOVx_pattern<v4i32, v2i32, i32, neon_uimm2_bare,
- neon_uimm1_bare, SMOVxs>;
+defm : Neon_SMOVx_pattern<v16i8, v8i8, i8, neon_uimm4_bare,
+ neon_uimm3_bare, SMOVxb>;
----------------
These pattern match SMOV nodes - would it not be better to pattern match (sext (COPY ) ) instead? That way the copy coalescer could be more effective. Same for UMOV nodes (zext (COPY ) )
http://llvm-reviews.chandlerc.com/D1854
More information about the llvm-commits
mailing list