[PATCH] AARCH64_BE load/store rules fix for ARM ABI (Part 2)
Tim Northover
t.p.northover at gmail.com
Fri Apr 11 01:10:45 PDT 2014
Hi Christian,
I've basically repeated all the comments I made on the original thread (the ones that I remember, anyway), since it got rather confusing.
Cheers..
Tim.
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4029-4030
@@ -3969,4 +4028,4 @@
// registers (N = 2,3,4)
-defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;
-defm LD3R : LDN_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r">;
-defm LD4R : LDN_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r">;
+// Multi-element loads suffer from element reversal in BE.
+let Predicates = [IsLE] in {
+ defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;
----------------
I don't believe they do. These replicating loads should be fine for both endians.
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4132-4133
@@ -4066,3 +4131,4 @@
// Load single 1-element structure to one lane of 1 register.
+// No dangerous element swaps in BE. :-)
defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;
----------------
These are currently broken (pending the global lane-reversal I suggested in D3306), but no more so than *any* instruction that refers to a specific lane, of which there are hundreds. I don't think they should be predicated.
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4235-4236
@@ -4151,3 +4234,4 @@
// Store single 1-element structure from one lane of 1 register.
+// single element should be fine in BE - no swapping of elements.
defm ST1LN : STN_Lane_BHSD<0b0, 0b0, "VOne", "st1">;
----------------
Same comment as the LD1LN instructions.
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:4356-4357
@@ +4355,4 @@
+// Multiple elements would be reversed in BE.
+let Predicates = [IsLE] in {
+ defm LD2R_WB : LDWB_Dup_BHSD<0b1, 0b110, "VPair", "ld2r", uimm_exact2,
+ uimm_exact4, uimm_exact8, uimm_exact16>;
----------------
Again, LD2R, LD3R and LD4R should be fine in both.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:5079
@@ +5078,3 @@
+ defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i32>;
+// defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1f32>; does not exist - v1f64 DOES -- WHY ?
+ }
----------------
Commented out code.
================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:107-115
@@ -106,5 +106,11 @@
-defm : uimm12_neon_pats<(A64WrapperSmall
- tconstpool:$Hi, tconstpool:$Lo12, ALIGN),
- (ADRPxi tconstpool:$Hi), (i64 tconstpool:$Lo12)>;
-
+// LDR is only valid for little endian.
+// In BE LDR needs correctly byte-swapped 128bit literals, so simple array
+// initializers won't work right now.
+// Big-endian must - for now - do the element swaps using vector intrinsics.
+// That's an additional "add offset12" instruction, there.
+// According to ARM, BE & LE should use intrinsics for initialization.
+// That's also the only portable code.
+// FIXME: BE could use vector-literal-swapping before emit pass.
+let Predicates = [IsLE] in { // TODO: this will eventually be removed
+ defm : uimm12_neon_pats<(A64WrapperSmall
----------------
I think this is incorrect, and your issues with big-endian constpools are probably the result of D3306.
http://reviews.llvm.org/D3345
More information about the llvm-commits
mailing list