[PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

Mon Nov 18 01:14:33 PST 2013

Hi Tim,

We still don’t have a conclusion about the solution of that comment. But as we are close to the deadline, I create a new patch refactoring all the other comments. The solution about the 64bitVector is still the same as the old version. If it is better to implement in another way, I can keep on refactoring.

The patches are attached and also available on http://llvm-reviews.chandlerc.com/D2211.

Review please. 

Thanks,

-Hao

From: Jiangning Liu [mailto:liujiangning1 at gmail.com] 
Sent: Monday, November 18, 2013 10:57 AM
To: Hao Liu
Cc: Tim Northover; llvm-commits at cs.uiuc.edu for LLVM; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

Hi Tim,

I think this scenario is different from table lookup. 

For table lookup, the intrinsic functions for 64-bit vector list defines a very special semantic, and it is quite different from 128-bit vecor list, so this is why we'd prefer to lower it in front-end to make it implemented with 128-bit vector list by doing packing action.

For this ldst_one instruction class, the 64-bit vector list can be very naturally mapped to 128-bit vector list. On one hand, using SUBREG_TO_REG and EXTRACT_SUBREG in table gen don't really make differences from directly generating them with CPP code. On the other hand, we still provide very lower level representation and enough flexibility in IR and SelectionDAG, because the intrinsic like int_arm_neon_vld3lane is very close to instructions, and using the value types attached to it's operands we can easily determine the scope requirement of the lane for instruction ld3 {vt.h - vt3.h}[lane], [xn] (0<=lane<=3 or 0<=lane<=7).

Also, I think we already have precedents using the same solution, for example, vset_lane_u8 and vsetq_lane_u8. If you have any more insights, let's discuss more about it. 

Thanks,

-Jiangning

2013/11/17 Hao Liu <Hao.Liu at arm.com>

Hi Tim,

I'm a little confused about this comment:

================
Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:883
@@ +882,3 @@

+                               N->op_begin() + Vec0Idx + NumVecs);  if

+ (is64BitVector)
+    for (unsigned i = 0; i < Regs.size(); i++)
----------------
I tend to think this complexity ought to be handled in the front-end (like the legacy TBL/TBX you did recently): a shufflvector extension before calling  bona fide 128-bit intrinsic.

I have implemented in the front-end for vld2_lane. But I'm not sure whether the solution is correct. There are many shufflevectors, because we need to transfer the input from 64bit vector to 128bit, and also transfer the output from 128bit to 64bit.  For vld2_lane, there will be 4 more shufflevectors. And 8 more shufflevectors for vld4_lane.
I haven't completed all the work yet. I just wonder if I implement the solution in a correct way?
If it is correct, I can keep on completing this tomorrow.

+  case AArch64::BI__builtin_neon_vld2q_lane_v:
+    return EmitARMBuiltinExpr(ARM::BI__builtin_neon_vld2q_lane_v, E);
+  case AArch64::BI__builtin_neon_vld2_lane_v: {
+    Ops[2] = Builder.CreateBitCast(Ops[2], VTy);
+    Ops[3] = Builder.CreateBitCast(Ops[3], VTy);
+    Value *SV, *SV2;
+    // Build a vector containing sequential number like (0, 1, 2, ..., 15)
+    SmallVector<Constant*, 16> Indices, Indices2;
+    for (unsigned i = 0, e = VTy->getNumElements(); i != e; ++i) {
+      Indices.push_back(ConstantInt::get(Int32Ty, 2*i));
+      Indices.push_back(ConstantInt::get(Int32Ty, 2*i+1));
+      Indices2.push_back(ConstantInt::get(Int32Ty, i));
+    }
+    SV = llvm::ConstantVector::get(Indices);
+    SV2 = llvm::ConstantVector::get(Indices2);
+    Value *undef64 = UndefValue::get(VTy);
+    Ops[2] = Builder.CreateShuffleVector(Ops[2], undef64, SV, "lane");
+    Ops[3] = Builder.CreateShuffleVector(Ops[3], undef64, SV, "lane");
+    llvm::VectorType *VTy64 = VTy;
+    VTy = llvm::VectorType::get(VTy->getElementType(), VTy->getNumElements() * 2);
+    Function *F = CGM.getIntrinsic(Intrinsic::arm_neon_vld2lane, VTy);
+    Ops.push_back(Align);
+    Ops[1] = Builder.CreateCall(F, makeArrayRef(Ops).slice(1), "vld2_lane");
+    Value *Val0 = Builder.CreateExtractValue(Ops[1], 0);
+    Value *Val1 = Builder.CreateExtractValue(Ops[1], 1);
+    Value *undef128 = UndefValue::get(VTy);
+    Val0 = Builder.CreateShuffleVector(Val0, undef128, SV2, "lane");
+    Val1 = Builder.CreateShuffleVector(Val1, undef128, SV2, "lane");
+    llvm::StructType *STy = llvm::StructType::get(VTy64, VTy64, NULL);
+    Value *InsVal = Builder.CreateInsertValue(UndefValue::get(STy), Val0, 0);
+    InsVal = Builder.CreateInsertValue(InsVal, Val1, 1);
+    Ty = llvm::PointerType::getUnqual(InsVal->getType());
+    Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
+    return Builder.CreateStore(InsVal, Ops[0]);
+  }

-----Original Message-----

From: cfe-commits-bounces at cs.uiuc.edu [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Tim Northover
Sent: Friday, November 15, 2013 3:56 AM
To: t.p.northover at gmail.com; liujiangning1 at gmail.com
Cc: llvm-commits at cs.uiuc.edu; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:763

@@ +762,3 @@

+                        Operand,

+                        CurDAG->getTargetConstant(AArch64::sub_64, 

+ MVT::i32));  return SDValue(Reg, 0);

----------------

Should this be SRIdx? If not, that variable is unused. It's a little worrying that this passes its tests actually.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:777

@@ +776,3 @@

+  default: llvm_unreachable("unhandled vld-dup type");  case MVT::v8i8:  

+ OpcodeIndex = 0; break;  case MVT::v4i16: OpcodeIndex = 1; break;

----------------

This OpcodeIndex mapping is horribly non-obvious and hard-coded. It also seems to omit any f16 vectors, which seems odd.

I'd suggest separate 64-bit and 128-bit tables, and basing the index off of Log2_32(VT.getScalarType().getSizeInBits()) or some other way to avoid enumerating all types.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:803

@@ +802,3 @@

+

+  std::vector<EVT> ResTys;

+  ResTys.push_back(MVT::Untyped); // Type of result super register

----------------

SmallVector would be better. Or perhaps a ?: with a call to SelectionDAG::getSDVTList.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:856

@@ +855,3 @@

+  default: llvm_unreachable("unhandled vld/vst lane type");  case 

+ MVT::v16i8: OpcodeIndex = 0; break;  case MVT::v8i16: OpcodeIndex = 1; 

+ break;

----------------

Same odd OpcodeIndex.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:885

@@ +884,3 @@

+    for (unsigned i = 0; i < Regs.size(); i++)

+      Regs[i] = getTargetSubregToReg(AArch64::sub_64, dl, VT, VT64, 

+ Regs[i]);  SDValue SuperReg = createQTuple(Regs);

----------------

Ah, I see why that function works now.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:833

@@ +832,3 @@

+SDNode *AArch64DAGToDAGISel::SelectVLDSTLane(SDNode *N, bool IsLoad,

+                                             unsigned NumVecs, bool isUpdating,

+                                             const uint16_t *Opcodes) {

----------------

"IsUpdating" according to coding standards. Especially as it's right next to "IsLoad" with the capital

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:883

@@ +882,3 @@

+                               N->op_begin() + Vec0Idx + NumVecs);  if 

+ (is64BitVector)

+    for (unsigned i = 0; i < Regs.size(); i++)

----------------

I tend to think this complexity ought to be handled in the front-end (like the legacy TBL/TBX you did recently): a shufflvector extension before calling  bona fide 128-bit intrinsic.

================

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:838

@@ +837,3 @@

+  unsigned AddrOpIdx = isUpdating ? 1 : 2;  unsigned Vec0Idx = 3; // 

+ AddrOpIdx + (isUpdating ? 2 : 1)

+

----------------

Commented code.

 <http://llvm-reviews.chandlerc.com/D2146> http://llvm-reviews.chandlerc.com/D2146

_______________________________________________

cfe-commits mailing list

 <mailto:cfe-commits at cs.uiuc.edu> cfe-commits at cs.uiuc.edu

 <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131118/26280b92/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lsone.llvm-v1.patch
Type: application/octet-stream
Size: 277246 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131118/26280b92/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lsone.clang-v1.patch
Type: application/octet-stream
Size: 79718 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131118/26280b92/attachment-0001.obj>