[PATCH][AArch64] implement aarch64 neon load/store instructions class AdvSIMD (lselem)

Fri Oct 4 08:27:31 PDT 2013

Hi Tim,

Thanks, that's helpful.

But I have a question about triple consecutive registers.

If I define a triple register as:
Def Tuples3D : RegisterTuples<[d0, d1, d2], [(rotl FPR64, 0), ... ,(rotl
FPR64, 2)];
Def DTriple : RegisterClass<"AArch64", [untyped], 64, add Tuples3D> {
    Let Size = 192;
}
The second argument of DTriple should be "v3i64". As there is no v3i64, I
have to use untyped. But then I can't get the DTripleRegClass in
getRegClassFor(MVT VT) of AArch64ISelLowering.cpp.

So now I use DQuad(contains 4 consecutive registers) to represent 3
consecutive registers. 
I wonder if the last register of DQuad is a waste? If not, I can keep using
DQuad.
Or do you know how to implement DTriple and get DTripleRegClass in
getRegClassFor(MVT VT) ?

Thanks,
-Hao

-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com] 
Sent: Friday, October 04, 2013 4:08 PM
To: Hao Liu
Cc: llvm-commits; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64] implement aarch64 neon load/store instructions
class AdvSIMD (lselem)

Hi Hao,

> At first, I tried to used them directly in the final instructions. But 
> I failed to print out the instructions correctly. I can't print out 
> the information about the layout (16b, 8h, 4s, ...)

I think that problem can be solved. You can create a set of RegisterOperands
VPair2s, VPair4h, ... with PrintMethods that put the correct ".2s", ".4h"
and so on in place.

> At last, I changed the solution. I used the same way ARM backend to 
> match the ld/st instructions in 2 steps:

I still don't think that's the right solution. These multiple-register
MCInsts are redundant and fragile, and don't really reflect what's possible
in the instruction-set. ARM appears to be trying to move away from them (see
the comment in the definition of the NEONLdStTableEntry stuct in
ARMExpandPseudoInsts.cpp).

I'm also skeptical about the kill/dead/undef flags being attached in the
Expand phase. I know you just copied the functions, but they seem to assume
that if Q0_Q1 is (say) undef then both Q0 and Q1 are. I can't seem to
provoke a failure because of this in ARM, but it looks dodgy.

> And I think it may be easy to read or do optimization to we show the 
> registers separately, if we can ensure the registers are consecutive.

Show them separately where? I don't quite follow this comment. In
MachineInstr dumps, the readability of "Q0_Q1_Q2_Q3<undef>" vs "Q0<undef>,
Q1<undef>, Q2<undef>, Q3<undef>" is moot. We don't optimise for that and
certainly don't use it to decide on an implementation.

> (2) About the chain.
> I think last time I didn't add the Chain to the operand. So I fix this 
> by adding the Chain in AArch64ISelDAGToDAG.cpp. I don't know how to 
> test it, but I think this may be fixed.

As far as I can see, yes, though I think this line could do with a comment
that it's updating users of the chain:

+  ReplaceUses(SDValue(N, NumVecs), SDValue(VLd, 1));

> You are right, it's hard to understand. The instructions are 
> load/store about multiple N-element structure. I change the name of some
definition.
> And also add comment to explain the load/store instructions:

Thanks Hao, comments are always useful.

Cheers.

Tim.