[PATCH][AArch64] implement aarch64 neon load/store instructions class AdvSIMD (lselem)
Hao Liu
Hao.Liu at arm.com
Fri Oct 4 08:27:31 PDT 2013
Hi Tim,
Thanks, that's helpful.
But I have a question about triple consecutive registers.
If I define a triple register as:
Def Tuples3D : RegisterTuples<[d0, d1, d2], [(rotl FPR64, 0), ... ,(rotl
FPR64, 2)];
Def DTriple : RegisterClass<"AArch64", [untyped], 64, add Tuples3D> {
Let Size = 192;
}
The second argument of DTriple should be "v3i64". As there is no v3i64, I
have to use untyped. But then I can't get the DTripleRegClass in
getRegClassFor(MVT VT) of AArch64ISelLowering.cpp.
So now I use DQuad(contains 4 consecutive registers) to represent 3
consecutive registers.
I wonder if the last register of DQuad is a waste? If not, I can keep using
DQuad.
Or do you know how to implement DTriple and get DTripleRegClass in
getRegClassFor(MVT VT) ?
Thanks,
-Hao
-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com]
Sent: Friday, October 04, 2013 4:08 PM
To: Hao Liu
Cc: llvm-commits; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64] implement aarch64 neon load/store instructions
class AdvSIMD (lselem)
Hi Hao,
> At first, I tried to used them directly in the final instructions. But
> I failed to print out the instructions correctly. I can't print out
> the information about the layout (16b, 8h, 4s, ...)
I think that problem can be solved. You can create a set of RegisterOperands
VPair2s, VPair4h, ... with PrintMethods that put the correct ".2s", ".4h"
and so on in place.
> At last, I changed the solution. I used the same way ARM backend to
> match the ld/st instructions in 2 steps:
I still don't think that's the right solution. These multiple-register
MCInsts are redundant and fragile, and don't really reflect what's possible
in the instruction-set. ARM appears to be trying to move away from them (see
the comment in the definition of the NEONLdStTableEntry stuct in
ARMExpandPseudoInsts.cpp).
I'm also skeptical about the kill/dead/undef flags being attached in the
Expand phase. I know you just copied the functions, but they seem to assume
that if Q0_Q1 is (say) undef then both Q0 and Q1 are. I can't seem to
provoke a failure because of this in ARM, but it looks dodgy.
> And I think it may be easy to read or do optimization to we show the
> registers separately, if we can ensure the registers are consecutive.
Show them separately where? I don't quite follow this comment. In
MachineInstr dumps, the readability of "Q0_Q1_Q2_Q3<undef>" vs "Q0<undef>,
Q1<undef>, Q2<undef>, Q3<undef>" is moot. We don't optimise for that and
certainly don't use it to decide on an implementation.
> (2) About the chain.
> I think last time I didn't add the Chain to the operand. So I fix this
> by adding the Chain in AArch64ISelDAGToDAG.cpp. I don't know how to
> test it, but I think this may be fixed.
As far as I can see, yes, though I think this line could do with a comment
that it's updating users of the chain:
+ ReplaceUses(SDValue(N, NumVecs), SDValue(VLd, 1));
> You are right, it's hard to understand. The instructions are
> load/store about multiple N-element structure. I change the name of some
definition.
> And also add comment to explain the load/store instructions:
Thanks Hao, comments are always useful.
Cheers.
Tim.
More information about the cfe-commits
mailing list