[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Thu Mar 6 02:21:53 PST 2014

  Hi Jiangning,

  I'm not sure I understand your comments. Do you mean ARM is intending to add C level types to ACLE & AAPCS that *will* behave as if loaded and stored with ld1/st1 soon?

  Cheers.

  Tim.


================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3424
@@ +3423,3 @@
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
+  defm LD1 : LDVList_BHSD<0b0111, "VOne", "ld1">;
----------------
Jiangning Liu wrote:
> Is this to disable LD1/LD2/LD3/LD4 for big-endian? If yes, why the test cases using those instructions can pass with big-endian configuration?
> 
> This piece of code is to define encodings, and LE/BE should always cover them. If we don't want to generate any instruction, we should control them with pattern match.
The "IsBE" predicate is codegen-level rather than an AssemblerPredicate so MC tests won't be affected anyway. And there's only one CodeGen test mentioning them that's not based on intrinsics (which gets more substantial changes), so I think that part's OK.

Your comment about only applying IsBE to patterns is a good one though.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3483
@@ -3438,2 +3482,3 @@
 
-defm ST2 : STVList_BHSD<0b1000, "VPair", "st2">;
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
----------------
Jiangning Liu wrote:
> ST1/ST2/ST3/ST4 essentially use aggregate short vector type like,
> 
> typedef struct int16x4x3_t {
>   int16x4_t val[3];
> } int16x4x3_t;
> 
> which is defined in arm_neon.h.
> 
> With this data type, LE/BE should only make difference for the layout inside element int16. The data layout among different elements should be always the same.
I believe this is incorrect for the simple instructions. "ld1 {v0.4h, v1.4h}, [x0]" is equivalent to "ld1 {v0.4h}, [x0]; ld1 {v1.4h}, [x0, #8]" and different from "ldr d0, [x0]; ldr d1, [x0, #8]" on big-endian systems.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3362
@@ +3361,3 @@
+//
+// Obviously the two layouts differ by reversing the elements so they can't be 
+// mixed without explicit element-swap operations in BE.
----------------
Jiangning Liu wrote:
> How do we come across a case mixing the uses of LDR and LD1? If it's type casting, end-user should guarantee the correctness by program logic itself rather than by compiler.
The compiler was mixing them at will previously (e.g. storeRegToStackSlot uses str, but this address could escape and be used in a normal load which we'd use ld1 for). I believe Albrecht's comment is designed to warn against this, and I support it.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3417
@@ +3416,3 @@
+// will be inconsistent.
+// The only allowed use of LD1 is in initializations using explicit intrinsics to do 
+// the element-swaps.
----------------
Jiangning Liu wrote:
> This is not the only case. Auto-vectorizer could generate element alignment short vector ld/st. For example, middle-end could generate
> 
> store <4 x i16> %val, <4 x i16>* %ptr, align 2
> 
> We should generate instruction like st1 v0.4h, [x0].
> 
> Unfortunately, we can't generate this instruction yet with trunk. We will get it fixed as soon as possible.
I don't believe we're forced to generate either and there are arguments in favour of both, but being consistent is *very* important. As Albrecht said, we can't mix the two kinds of load/store.

I agree that using ld1/st1 exclusively would make LLVM's semantics easier to get right, but it would make getting the AAPCS right harder (bitcasts would become non-trivial operations and be needed at all potentially ABI-visible boundaries).

I suspect (but don't know) that the ldr/str route is capable of producing better code on average.


http://llvm-reviews.chandlerc.com/D2884