[PATCH][AArch64] implement 3 aarch64 neon instrunctions (umov smov ins) in llvm
Tim Northover
t.p.northover at gmail.com
Mon Sep 2 08:26:13 PDT 2013
Hi Kevin,
Some of these choices may have good reasons, but looking through the
code I've got some questions:
> To avoid legalizing signed-extend and zero-extend to 'and' and
> 'sign_extend_inreg', we combined them with extractelement in early combine
> stage, presenting as 'Neon_vector_extract' node.
I'm not sure I understand this bit. Why do we need to combine these
instead of directly writing patterns like:
def : Pat<(sext_inreg (extract_elt v8i8), i8), (SMOV
+ if(AArch64::FPR64RegClass.contains(SReg)) {
+ return SReg + AArch64::V0 - AArch64::D0;
+ }
We shouldn't be relying on these enum orders unless we have to. In
this case MCRegisterInfo::getMatchingSuperReg is probably what you
want.
+ if(getSubTarget().hasNEON())
+ [...]
+ else
What's the advantage of emitting a different instruction if the core
has NEON here? The normal FP one should work regardless (and may
actually be faster since the lane doesn't have to be decoded)
+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT", SDT_Neon_mov_lane>;
+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT", SDTypeProfile<1, 3,
+ [SDTCisVec<0>, SDTCisSameAs<0, 1>,
+ SDTCisInt<2>, SDTCisVT<3, i32>]>>;
What's wrong with the existing extractelt and insertelt?
+//def neon_uimm3_asmoperand : AsmOperandClass
Commented out code.
+def neon_uimm0_chomp_hash : Operand<i32>
Not convinced on the name here. "Chomp" implies an active removal,
which isn't what's happening here: it should never be there in the
first place.
I think I used the adjective "bare" for the single place I came across
it before ("fmov xD, vN.d[1]" isn't actually a NEON instruction). But
I'm not wedded to that one either.
+ let Inst{14-11} = {Immn{3}, Immn{2}, Immn{1}, Immn{0}};
That's just Immn isn't it? Though I do quite like the representation
in the Immd case; nice idea!
+ let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};
What's bit 11? Same for later ones. If it's set elsewhere, a comment
would be useful. If it's not then setting it is probably essential.
+ def _16B16 : NeonI_insert<0b1, 0b1,
+ [...]
+ def _16B8 : NeonI_insert<0b1, 0b1,
Why are these separate instructions? It seems to be some attempt to
model the VPR64/VPR128 distinction, but I don't think it actually
gains us anything. It's not like the high part of VPR128 can be used
independently really.
+ (outs VPR128:$Rd), (ins VPR128:$Rn, VPR128:$src,
+ neon_uimm4_chomp_hash:$Immd, neon_uimm4_chomp_hash:$Immn),
I think this would be more natural if the operands followed the order
in the assembly syntax.
+ [(set (v16i8 VPR128:$Rd), (v16i8 (Neon_vector_insert
+ (v16i8 VPR128:$src), (i32 (Neon_vector_extract
+ (v16i8 VPR128:$Rn), (neon_uimm4_chomp_hash:$Immn))),
The formatting makes that rather difficult to parse.
Cheers.
Tim.
More information about the llvm-commits
mailing list