[PATCH] AARCH64_BE load/store rules fix for ARM ABI (Part 2)

Fri Apr 11 01:10:45 PDT 2014

  Hi Christian,

  I've basically repeated all the comments I made on the original thread (the ones that I remember, anyway), since it got rather confusing.

  Cheers..

  Tim.


================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4029-4030
@@ -3969,4 +4028,4 @@
 // registers (N = 2,3,4)
-defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;
-defm LD3R : LDN_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r">;
-defm LD4R : LDN_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r">;
+// Multi-element loads suffer from element reversal in BE.
+let Predicates = [IsLE] in {
+  defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;
----------------
I don't believe they do. These replicating loads should be fine for both endians.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4132-4133
@@ -4066,3 +4131,4 @@
 // Load single 1-element structure to one lane of 1 register.
+// No dangerous element swaps in BE. :-)
 defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;
 
----------------
These are currently broken (pending the global lane-reversal I suggested in D3306), but no more so than *any* instruction that refers to a specific lane, of which there are hundreds. I don't think they should be predicated.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4235-4236
@@ -4151,3 +4234,4 @@
 // Store single 1-element structure from one lane of 1 register.
+// single element should be fine in BE - no swapping of elements.
 defm ST1LN : STN_Lane_BHSD<0b0, 0b0, "VOne", "st1">;
 
----------------
Same comment as the LD1LN instructions.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4356-4357
@@ +4355,4 @@
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
+  defm LD2R_WB : LDWB_Dup_BHSD<0b1, 0b110, "VPair", "ld2r", uimm_exact2,
+                               uimm_exact4, uimm_exact8, uimm_exact16>;
----------------
Again, LD2R, LD3R and LD4R should be fine in both.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:5079
@@ +5078,3 @@
+    defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i32>;
+//    defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1f32>; does not exist - v1f64 DOES --  WHY ?
+  }
----------------
Commented out code.

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:107-115
@@ -106,5 +106,11 @@
 
-defm : uimm12_neon_pats<(A64WrapperSmall
-                          tconstpool:$Hi, tconstpool:$Lo12, ALIGN),
-                        (ADRPxi tconstpool:$Hi), (i64 tconstpool:$Lo12)>;
-
+// LDR is only valid for little endian. 
+// In BE LDR needs correctly byte-swapped 128bit literals, so simple array 
+// initializers won't work right now.
+// Big-endian must - for now - do the element swaps using vector intrinsics.
+// That's an additional "add offset12" instruction, there.
+// According to ARM, BE & LE should use intrinsics for initialization.
+// That's also the only portable code.
+// FIXME: BE could use vector-literal-swapping before emit pass.
+let Predicates = [IsLE] in { // TODO: this will eventually be removed
+  defm : uimm12_neon_pats<(A64WrapperSmall
----------------
I think this is incorrect, and your issues with big-endian constpools are probably the result of D3306.


http://reviews.llvm.org/D3345