[PATCH] D65583: [ARM] MVE big endian loads/stores

Thu Aug 8 01:56:53 PDT 2019

dmgreen marked 3 inline comments as done.
dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:4789
+  def  : Pat<(v4i1 (load t2addrmode_imm7<2>:$addr)),
+             (v4i1 (VLDR_P0_off t2addrmode_imm7<2>:$addr))>;
 }
----------------
simon_tatham wrote:
> samparker wrote:
> > Do we support v2i1 as well?
> Currently, v2i1 isn't listed among the vector-of-i1 types that are legal in the VCCR regclass, so it probably wouldn't help to add a pattern for it here.
> 
> I have a small patch that adds it in various other places, as part of my unfinished prototype for the ACLE MVE intrinsics support. If it's becoming urgent, I could easily pull that patch out and submit it before I get the rest finished?
I have not added v2i1 to anything yet. It is not usually very useful, as we do not support any of the cmp's needed to produce it. Adding it sound like the kind of thing that would need a lot of testing.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-loadstore.ll:16
+; CHECK-BE-NEXT:    vshr.u32 q1, q0, #1
+; CHECK-BE-NEXT:    vrev64.32 q0, q1
+; CHECK-BE-NEXT:    bx lr
----------------
samparker wrote:
> So why not vrev32? I'm having real difficulty understanding why these vrev instructions are augmented with 'size'.
These are just because we are returning a <4 x i32>, and the calling convention is a little odd at the moment. See the other BE patch in D65581.

The way I think of the vrevs is that the do 2 different bit level reverses. This one does on to i64's, then they are reversed again in i32's. So you end up with the bytes in the correct order, but re-arranged.

Point is that this one is not really the interesting part of this test. It is just an artifact of the calling convention.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-loadstore.ll:34
+; CHECK-BE-NEXT:    vldrb.u8 q0, [r0]
+; CHECK-BE-NEXT:    vrev32.8 q0, q0
+; CHECK-BE-NEXT:    vshr.u32 q1, q0, #1
----------------
This one here is an example of load being bit-reversed, because it's alignment is too low. It's loaded with i8's and VREV'd to get the correct values into the i32's.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65583/new/

https://reviews.llvm.org/D65583