[PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

Sun Nov 17 18:57:24 PST 2013

Hi Tim,

I think this scenario is different from table lookup.

For table lookup, the intrinsic functions for 64-bit vector list defines a
very special semantic, and it is quite different from 128-bit vecor list,
so this is why we'd prefer to lower it in front-end to make it implemented
with 128-bit vector list by doing packing action.

For this ldst_one instruction class, the 64-bit vector list can be very
naturally mapped to 128-bit vector list. On one hand, using SUBREG_TO_REG
and EXTRACT_SUBREG in table gen don't really make differences from directly
generating them with CPP code. On the other hand, we still provide very
lower level representation and enough flexibility in IR and SelectionDAG,
because the intrinsic like int_arm_neon_vld3lane is very close to
instructions, and using the value types attached to it's operands we can
easily determine the scope requirement of the lane for instruction ld3
{vt.h - vt3.h}[lane], [xn] (0<=lane<=3 or 0<=lane<=7).

Also, I think we already have precedents using the same solution, for
example, vset_lane_u8 and vsetq_lane_u8. If you have any more insights,
let's discuss more about it.

Thanks,
-Jiangning

2013/11/17 Hao Liu <Hao.Liu at arm.com>

> Hi Tim,
>
> I'm a little confused about this comment:
> ================
> Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:883
> @@ +882,3 @@
> +                               N->op_begin() + Vec0Idx + NumVecs);  if
> + (is64BitVector)
> +    for (unsigned i = 0; i < Regs.size(); i++)
> ----------------
> I tend to think this complexity ought to be handled in the front-end (like
> the legacy TBL/TBX you did recently): a shufflvector extension before
> calling  bona fide 128-bit intrinsic.
>
>
> I have implemented in the front-end for vld2_lane. But I'm not sure
> whether the solution is correct. There are many shufflevectors, because we
> need to transfer the input from 64bit vector to 128bit, and also transfer
> the output from 128bit to 64bit.  For vld2_lane, there will be 4 more
> shufflevectors. And 8 more shufflevectors for vld4_lane.
> I haven't completed all the work yet. I just wonder if I implement the
> solution in a correct way?
> If it is correct, I can keep on completing this tomorrow.
>
> +  case AArch64::BI__builtin_neon_vld2q_lane_v:
> +    return EmitARMBuiltinExpr(ARM::BI__builtin_neon_vld2q_lane_v, E);
> +  case AArch64::BI__builtin_neon_vld2_lane_v: {
> +    Ops[2] = Builder.CreateBitCast(Ops[2], VTy);
> +    Ops[3] = Builder.CreateBitCast(Ops[3], VTy);
> +    Value *SV, *SV2;
> +    // Build a vector containing sequential number like (0, 1, 2, ..., 15)
> +    SmallVector<Constant*, 16> Indices, Indices2;
> +    for (unsigned i = 0, e = VTy->getNumElements(); i != e; ++i) {
> +      Indices.push_back(ConstantInt::get(Int32Ty, 2*i));
> +      Indices.push_back(ConstantInt::get(Int32Ty, 2*i+1));
> +      Indices2.push_back(ConstantInt::get(Int32Ty, i));
> +    }
> +    SV = llvm::ConstantVector::get(Indices);
> +    SV2 = llvm::ConstantVector::get(Indices2);
> +    Value *undef64 = UndefValue::get(VTy);
> +    Ops[2] = Builder.CreateShuffleVector(Ops[2], undef64, SV, "lane");
> +    Ops[3] = Builder.CreateShuffleVector(Ops[3], undef64, SV, "lane");
> +    llvm::VectorType *VTy64 = VTy;
> +    VTy = llvm::VectorType::get(VTy->getElementType(),
> VTy->getNumElements() * 2);
> +    Function *F = CGM.getIntrinsic(Intrinsic::arm_neon_vld2lane, VTy);
> +    Ops.push_back(Align);
> +    Ops[1] = Builder.CreateCall(F, makeArrayRef(Ops).slice(1),
> "vld2_lane");
> +    Value *Val0 = Builder.CreateExtractValue(Ops[1], 0);
> +    Value *Val1 = Builder.CreateExtractValue(Ops[1], 1);
> +    Value *undef128 = UndefValue::get(VTy);
> +    Val0 = Builder.CreateShuffleVector(Val0, undef128, SV2, "lane");
> +    Val1 = Builder.CreateShuffleVector(Val1, undef128, SV2, "lane");
> +    llvm::StructType *STy = llvm::StructType::get(VTy64, VTy64, NULL);
> +    Value *InsVal = Builder.CreateInsertValue(UndefValue::get(STy), Val0,
> 0);
> +    InsVal = Builder.CreateInsertValue(InsVal, Val1, 1);
> +    Ty = llvm::PointerType::getUnqual(InsVal->getType());
> +    Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
> +    return Builder.CreateStore(InsVal, Ops[0]);
> +  }
>
> 
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131118/f71e3300/attachment.html>