[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Thu Mar 6 01:17:10 PST 2014

  Hi Albrecht,

  AAPCS64 requires to use LDR/STR only for short vectors defined in AAPCS64. The definition of short vector in AAPCS64 requires the monolithic alignment of the whole short vector rather than element alignment.

  For some reason, LLVM compiler could generate element alignment short vector for storing array purpose. This type should be different from the short vector defined in AAPCS64. All of the instruction using this data type should fall into LD1/ST1. LD1/ST1 should not make difference for element ordering between LE/BE, and the only difference is the type ordering inside the element. We will be supporting this element alignment short vector access soon. Refer to the example inlined.

  There are some other comments inlined.

  Thanks,
  -Jiangning


================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3424
@@ +3423,3 @@
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
+  defm LD1 : LDVList_BHSD<0b0111, "VOne", "ld1">;
----------------
Is this to disable LD1/LD2/LD3/LD4 for big-endian? If yes, why the test cases using those instructions can pass with big-endian configuration?

This piece of code is to define encodings, and LE/BE should always cover them. If we don't want to generate any instruction, we should control them with pattern match.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:107
@@ -106,1 +106,3 @@
 
+// LDR is only valid for little endian. 
+// In BE LDR needs correctly byte-swapped 128bit literals, so simple array 
----------------
This comment is misleading. Every instruction should be valid for big-endian, although the same instruction can have different behaviors for LE/BE.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3362
@@ +3361,3 @@
+//
+// Obviously the two layouts differ by reversing the elements so they can't be 
+// mixed without explicit element-swap operations in BE.
----------------
How do we come across a case mixing the uses of LDR and LD1? If it's type casting, end-user should guarantee the correctness by program logic itself rather than by compiler.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3417
@@ +3416,3 @@
+// will be inconsistent.
+// The only allowed use of LD1 is in initializations using explicit intrinsics to do 
+// the element-swaps.
----------------
This is not the only case. Auto-vectorizer could generate element alignment short vector ld/st. For example, middle-end could generate

store <4 x i16> %val, <4 x i16>* %ptr, align 2

We should generate instruction like st1 v0.4h, [x0].

Unfortunately, we can't generate this instruction yet with trunk. We will get it fixed as soon as possible.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3483
@@ -3438,2 +3482,3 @@
 
-defm ST2 : STVList_BHSD<0b1000, "VPair", "st2">;
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
----------------
ST1/ST2/ST3/ST4 essentially use aggregate short vector type like,

typedef struct int16x4x3_t {
  int16x4_t val[3];
} int16x4x3_t;

which is defined in arm_neon.h.

With this data type, LE/BE should only make difference for the layout inside element int16. The data layout among different elements should be always the same.


http://llvm-reviews.chandlerc.com/D2884