[PATCH] D67085: [ARM] Fix loads and stores for v4i1 and v8i1

Mon Sep 9 03:53:54 PDT 2019

samparker added inline comments.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-pred-bitcast.ll:23
 ; CHECK-NEXT:    vmov.i32 q1, #0x0
-; CHECK-NEXT:    vldr p0, [r0]
+; CHECK-NEXT:    vcmp.i32 ne, q2, zr
 ; CHECK-NEXT:    vpsel q0, q0, q1
----------------
I'm missing something here... from my understanding:
- Select 16 bytes, taken from q2 (0xff) and q1(0x0), building a vector predicate mask in q1.
- Then we take the bottom 4 bytes from q1, the mask, putting each into a 32-bit lane of q2.
- Then we compare the 32-bit lanes of q2 against zero.
- Then we select bytes from q0 (%a) and q1 (zero).

It's the second point that I don't understand... why do we only access the lower lanes of q1? 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67085/new/

https://reviews.llvm.org/D67085