[PATCH][AArch64] implement aarch64 neon load/store instructions class AdvSIMD (lselem)

Fri Oct 4 06:36:22 PDT 2013

Hi Tim,

I've refactored the patch to solve 3 problems you commented.

(1) About the consecutive registers.
I created DPair/DQuad/QPair/QQuad to allocate consecutive registers. 
At first, I tried to used them directly in the final instructions. But I
failed to print out the instructions correctly. I can't print out the
information about the layout (16b, 8h, 4s, ...)
    E.g. We use $SuReg as a super register of QPair. 
            If we use asm pattern: "ld2 {$SuReg}, [$Rn]", we only have the
super register number. The output is like ld2 {v0, v1}, [x0].
            But if we use asm pattern: "ld2 {$SuReg.16b}, [$Rn]", the output
is like ld2 {v0, v1.16b}, [x0]. Still wrong.

At last, I changed the solution. I used the same way ARM backend to match
the ld/st instructions in 2 steps:
      1st step: use a Pseudo instruction with QPair to allocate 2
consecutive registers.
      2nd step: transfer the Pseudo instruction to real instruction with 2
registers showed separately. The asm pattern is like "ld2 {$Rt.16b,
$Rt2.16b}, [$Rn]". Then we can have the correct output: ld2 {v0.16b,
v1.16b}, [x0].
And I think it may be easy to read or do optimization to we show the
registers separately, if we can ensure the registers are consecutive.

(2) About the chain.
I think last time I didn't add the Chain to the operand. So I fix this by
adding the Chain in AArch64ISelDAGToDAG.cpp. I don't know how to test it,
but I think this may be fixed.

(3) About the name "Elem".
You are right, it's hard to understand. The instructions are load/store
about multiple N-element structure. I change the name of some definition.
And also add comment to explain the load/store instructions:
+// The followings are for structure load/store instruction class (multiple)
+//
+// ld1:         load multiple 1-element structure to 1/2/3/4 registers.
+// ld2/ld3/ld4: load multiple N-element structure to N registers (N = 2, 3,
4).
+//              The structure consists of a sequence of sets of N values.
+//              The first element of the structure is placed in the first
lane of
+//              the first first vector, the second element in the first
lane of
+//              the second vector, and so on. 
+// E.g. LD1_3V_2S will load 32-bit elements {A, B, C, D, E, F} sequentially
into
+// the three 64-bit vectors list {BA, DC, FE}.
+// E.g. LD3_2S will load 32-bit elements {A, B, C, D, E, F} into the three
+// 64-bit vectors list {DA, EB, FC}.
+// Store is similar to load.

Thanks,
-Hao

-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com] 
Sent: Tuesday, September 24, 2013 8:33 AM
To: Hao Liu
Cc: llvm-commits; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64] implement aarch64 neon load/store instructions
class AdvSIMD (lselem)

Hi Hao,

I think there are a few issues here. The biggest is that using separate
registers in the ins/outs list means that the register-allocator won't know
they must be consecutive and might decide to create an instruction like "ld2
{v12.4s, v2.4s}, [x0]".

I think the ARM solution of using pairs, triplets and so on is probably the
best option.

Also, the ld1/st1 instructions should probably have normal patterns for a
single vector load and store as well. When writing some kind of test I
encountered "could not select" errors on simple vector stores.

Finally, I think the SelectVLD (& possibly SelectVST) code doesn't transfer
the chains properly and leaves the @llvm.arm.neon.vldN in the DAG after
selection. The attached file (after adding a store v16i8
pattern) should demonstrate this (it was my attempt to trick regalloc into
doing the wrong thing, so may be useful for your own tests there, though I
didn't get that far).

Oh, and one minor naming detail: NeonI_LdStElem. I thought the "Elem"
instructions were different ones, like "ld1 { v0.4s[2] }, [x0]". I'd have
called these "Multiple" as in the comment, or something similar.

Cheers.

Tim.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-simd-ldst-multi-v2.patch
Type: application/octet-stream
Size: 214046 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131004/d6de2ee0/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang-simd-ldst-multi-v2.patch
Type: application/octet-stream
Size: 42841 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131004/d6de2ee0/attachment-0001.obj>