[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Wed Feb 26 05:45:01 PST 2014

  As they say for complex reviews: start early & iterate :-)

  I forgot to add the bigger roadmap towards BE support
  1) disable LDx/STx
  2) regain matcher coverage by adding LDR/STR
  3) fix BE calling conventions to gain code correctness in BE (almost there internally)
  4) optionally enable LDR/STR for LE
  5) re-enable some LDn rules with extensive testing for nice interaction with STRed data structures

  This patch covers 1-2, 3 is a prerequisite for 5 (inlining printf sucks)
  Ideally these would be separate patches each - but matcher fails don't go down well.

  YES, I've been liberal with the disabling - anything that looked dangerous had to go for now - for BE only.
  I figured that single element duplicating loads may be fine.
  But whether v8i8 or just 128bit elements (how about 64 bits??) are fine is still work in progress - needs testing.
  Documentation is plentiful, and I haven't found what I'd really need, yet.

  Then I added the new patterns conservatively for BE only - don't want to change LE code just now (we're comparing LE to BE, for example). Also not going to ruffle feathers of any LE guys (one pandora's box at the same time)

  Right now, the focus is on getting BE to actually work (correctness first).
  That needs another upcoming patch to the calling conventions - step 3) above.

  Then we have correct code coverage, and we'll extend from there - within project limitations.


================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4856
@@ +4855,3 @@
+
+// wrappers to instantiate all allowed same-size fp/vector loads
+
----------------
Tim Northover wrote:
> Could we have capital letters at the start of sentences?
Done

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4858-4859
@@ +4857,4 @@
+
+// NEON-BE: allow all neon vectors as well, since ld1/st1 must be disabled
+// LD1 & ST1 are not ABI conforming in big endian: wrong arg memory layout
+// TODO: eventually also enable for LE 
----------------
Tim Northover wrote:
> It'd probably be a good idea to refer people to the AAPCS here for more details.
Done. the URL just so fits the 80cols limit :-)

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4860
@@ +4859,3 @@
+// LD1 & ST1 are not ABI conforming in big endian: wrong arg memory layout
+// TODO: eventually also enable for LE 
+// (desired by ARM - smaller code due to more powerful adressing modes)
----------------
Tim Northover wrote:
> Why is this only eventually? Couldn't it be enabled now if it's got better addressing-mode properties?
1) we're using LE as reference - so trying not to change that, yet.
2) people working on LE might oppose the code changes.
the comment was intended to start discussion that eventually leads to enabling.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4893
@@ +4892,3 @@
+    defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i32>;
+//    defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1f32>; does not exist - v1f64 DOES --  WHY ?
+  }
----------------
Tim Northover wrote:
> Commented code.
Yeah - I'm still wondering why there's a v1f64 non terminal, but no v1i32.
Any idea?
Symmetry would suggest, both should exist.


================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4901
@@ +4900,3 @@
+  // float is always there
+  // shouldn't this be guarded by HasFPARMv8 ???
+  defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, f64>;
----------------
Tim Northover wrote:
> If the CPU doesn't have FPARMv8 then f64 won't be a legal type and the DAG shouldn't contain any instances of it by this time.
ARM32 ended up having a few options for hard float units.

wasn't sure since there're other uses above - but not at all consistently. 
E.g:
let Predicates = [HasFPARMv8] in {
def : Pat<(i32 (fp_to_sint f32:$Rn)), (FCVTZSws $Rn)>; ...

What do we do NOW ?
Cleanup (which way?) / leave as is (-> add guards for new code or not?) ?

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:4904
@@ +4903,3 @@
+
+  // TODO: eventually also enable for LE 
+  let Predicates = [HasNEON, IsBE] in {
----------------
Tim Northover wrote:
> LLVM normally uses "FIXME" rather than "TODO". A consistent choice makes grepping a bit easier.
DONE

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3391
@@ -3383,1 +3390,3 @@
+// this should work in BE - single operand: no element swaps
+let Predicates = [IsLE] in {
 def LD1_1D : NeonI_LDVList<0, 0b0111, 0b11, VOne1D_operand, "ld1">;
----------------
Tim Northover wrote:
> There aren't any patterns in this multiclass so this is superfluous. If patterns *are* added, it's not clear that they'll be wrong for BE either: they could be the int_arm_neon_vldN version which you do want.
True - It's more a reminder for the guy who adds patterns, that LDn/STn make trouble in BE, while it's fine in LE.
LE implementation is farther ahead - adding patterns without considering BE will be troublesome.
-> Shall I convert that to a comment ?

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3888
@@ -3866,2 +3887,3 @@
 
+let Predicates = [IsLE] in {
 // Load single 1-element structure to all lanes of 1 register
----------------
Tim Northover wrote:
> No patterns and you probably *would* want any that existed since the layout issue doesn't exist for the duplicating loads.
True for the single-element replications below - first candidate for re-enabling.
Probably not true for the vector replicating loads here (if a single vector is loaded in reverse-element order, the duplication won't fix that ?)

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3907
@@ +3906,3 @@
+
+let Predicates = [IsLE] in {
+// element swap on bytes == byte swap
----------------
Tim Northover wrote:
> Ah, here they are. I think these patterns should be endian-independent.
Need to do more reading to fully understand all details of the swapping - but single element reads should be fine, if they do internal byte-swapping within the element (-> v1x nonterminals and scalar nonterminals)
- so conservatively "not yet" - i.e. until we have working code (CallingConv)

================
Comment at: lib/Target/AArch64/AArch64InstrNEON.td:3996
@@ -3966,3 +3995,3 @@
 
-// Load single 1-element structure to one lane of 1 register.
-defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;
+let Predicates = [IsLE] in {
+  // Load single 1-element structure to one lane of 1 register.
----------------
Tim Northover wrote:
> Loading a single lane is also layout independent (and these are not the patterns).
need to check whether ld4ln ({1,2,3,4})  yields the same result on BE & LE

probably also depends on whether you feed that with an array (then yes) or a vector stored by STR  (then probably NOT) -> pattern-dependent ?


http://llvm-reviews.chandlerc.com/D2884