RE: [PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

Sun Nov 17 06:58:25 PST 2013

Hi Tim,

I'm a little confused about this comment:
================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:883
@@ +882,3 @@
+                               N->op_begin() + Vec0Idx + NumVecs);  if 
+ (is64BitVector)
+    for (unsigned i = 0; i < Regs.size(); i++)
----------------
I tend to think this complexity ought to be handled in the front-end (like the legacy TBL/TBX you did recently): a shufflvector extension before calling  bona fide 128-bit intrinsic.

I have implemented in the front-end for vld2_lane. But I'm not sure whether the solution is correct. There are many shufflevectors, because we need to transfer the input from 64bit vector to 128bit, and also transfer the output from 128bit to 64bit.  For vld2_lane, there will be 4 more shufflevectors. And 8 more shufflevectors for vld4_lane.
I haven't completed all the work yet. I just wonder if I implement the solution in a correct way? 
If it is correct, I can keep on completing this tomorrow.

+  case AArch64::BI__builtin_neon_vld2q_lane_v:
+    return EmitARMBuiltinExpr(ARM::BI__builtin_neon_vld2q_lane_v, E);
+  case AArch64::BI__builtin_neon_vld2_lane_v: {
+    Ops[2] = Builder.CreateBitCast(Ops[2], VTy);
+    Ops[3] = Builder.CreateBitCast(Ops[3], VTy);
+    Value *SV, *SV2;
+    // Build a vector containing sequential number like (0, 1, 2, ..., 15)
+    SmallVector<Constant*, 16> Indices, Indices2;
+    for (unsigned i = 0, e = VTy->getNumElements(); i != e; ++i) {
+      Indices.push_back(ConstantInt::get(Int32Ty, 2*i));
+      Indices.push_back(ConstantInt::get(Int32Ty, 2*i+1));
+      Indices2.push_back(ConstantInt::get(Int32Ty, i));
+    }
+    SV = llvm::ConstantVector::get(Indices);
+    SV2 = llvm::ConstantVector::get(Indices2);
+    Value *undef64 = UndefValue::get(VTy);
+    Ops[2] = Builder.CreateShuffleVector(Ops[2], undef64, SV, "lane");
+    Ops[3] = Builder.CreateShuffleVector(Ops[3], undef64, SV, "lane");
+    llvm::VectorType *VTy64 = VTy;
+    VTy = llvm::VectorType::get(VTy->getElementType(), VTy->getNumElements() * 2);
+    Function *F = CGM.getIntrinsic(Intrinsic::arm_neon_vld2lane, VTy);
+    Ops.push_back(Align);
+    Ops[1] = Builder.CreateCall(F, makeArrayRef(Ops).slice(1), "vld2_lane");
+    Value *Val0 = Builder.CreateExtractValue(Ops[1], 0);
+    Value *Val1 = Builder.CreateExtractValue(Ops[1], 1);
+    Value *undef128 = UndefValue::get(VTy);
+    Val0 = Builder.CreateShuffleVector(Val0, undef128, SV2, "lane");
+    Val1 = Builder.CreateShuffleVector(Val1, undef128, SV2, "lane");
+    llvm::StructType *STy = llvm::StructType::get(VTy64, VTy64, NULL);
+    Value *InsVal = Builder.CreateInsertValue(UndefValue::get(STy), Val0, 0);
+    InsVal = Builder.CreateInsertValue(InsVal, Val1, 1);
+    Ty = llvm::PointerType::getUnqual(InsVal->getType());
+    Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
+    return Builder.CreateStore(InsVal, Ops[0]);
+  }

Thanks,
-Hao

-----Original Message-----
From: cfe-commits-bounces at cs.uiuc.edu [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Tim Northover
Sent: Friday, November 15, 2013 3:56 AM
To: t.p.northover at gmail.com; liujiangning1 at gmail.com
Cc: llvm-commits at cs.uiuc.edu; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:763
@@ +762,3 @@
+                        Operand,
+                        CurDAG->getTargetConstant(AArch64::sub_64, 
+ MVT::i32));  return SDValue(Reg, 0);
----------------
Should this be SRIdx? If not, that variable is unused. It's a little worrying that this passes its tests actually.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:777
@@ +776,3 @@
+  default: llvm_unreachable("unhandled vld-dup type");  case MVT::v8i8:  
+ OpcodeIndex = 0; break;  case MVT::v4i16: OpcodeIndex = 1; break;
----------------
This OpcodeIndex mapping is horribly non-obvious and hard-coded. It also seems to omit any f16 vectors, which seems odd.

I'd suggest separate 64-bit and 128-bit tables, and basing the index off of Log2_32(VT.getScalarType().getSizeInBits()) or some other way to avoid enumerating all types.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:803
@@ +802,3 @@
+
+  std::vector<EVT> ResTys;
+  ResTys.push_back(MVT::Untyped); // Type of result super register
----------------
SmallVector would be better. Or perhaps a ?: with a call to SelectionDAG::getSDVTList.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:856
@@ +855,3 @@
+  default: llvm_unreachable("unhandled vld/vst lane type");  case 
+ MVT::v16i8: OpcodeIndex = 0; break;  case MVT::v8i16: OpcodeIndex = 1; 
+ break;
----------------
Same odd OpcodeIndex.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:885
@@ +884,3 @@
+    for (unsigned i = 0; i < Regs.size(); i++)
+      Regs[i] = getTargetSubregToReg(AArch64::sub_64, dl, VT, VT64, 
+ Regs[i]);  SDValue SuperReg = createQTuple(Regs);
----------------
Ah, I see why that function works now.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:833
@@ +832,3 @@
+SDNode *AArch64DAGToDAGISel::SelectVLDSTLane(SDNode *N, bool IsLoad,
+                                             unsigned NumVecs, bool isUpdating,
+                                             const uint16_t *Opcodes) {
----------------
"IsUpdating" according to coding standards. Especially as it's right next to "IsLoad" with the capital

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:883
@@ +882,3 @@
+                               N->op_begin() + Vec0Idx + NumVecs);  if 
+ (is64BitVector)
+    for (unsigned i = 0; i < Regs.size(); i++)
----------------
I tend to think this complexity ought to be handled in the front-end (like the legacy TBL/TBX you did recently): a shufflvector extension before calling  bona fide 128-bit intrinsic.

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:838
@@ +837,3 @@
+  unsigned AddrOpIdx = isUpdating ? 1 : 2;  unsigned Vec0Idx = 3; // 
+ AddrOpIdx + (isUpdating ? 2 : 1)
+
----------------
Commented code.

http://llvm-reviews.chandlerc.com/D2146
_______________________________________________
cfe-commits mailing list
cfe-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.llvm
Type: application/octet-stream
Size: 105080 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131117/47e6b591/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.clang
Type: application/octet-stream
Size: 11208 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131117/47e6b591/attachment-0001.obj>

RE: [PATCH] Implement aarch64 neon instructio​n class SIMD lsone and lsone-post - LLVM

RE: [PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM