[PATCH] D111135: [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit

Wed Oct 6 04:38:48 PDT 2021

peterwaller-arm added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll:216
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    ext z0.b, z0.b, z1.b, #0
+; CHECK-NEXT:    ext z0.b, z0.b, z1.b, #8
 ; CHECK-NEXT:    ret
----------------
I expected to find a #2 here (1 x f16's worth of bytes), but there is a #8, which I can't quickly explain?

================
Comment at: llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll:227
 ; CHECK-NEXT:    ret
-  %res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 1)
+  %res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 31)
   ret <vscale x 2 x half> %res
----------------
Continuing the logic of the earlier comment I make the immediate maximum to be 255 (bytes) - 2 (1 x f16) = 253, but we can't access byte 253 because we need a whole number of 2-byte halves, so I'm expecting to see 252/2 = 126 as input and #252 as the immediate.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111135/new/

https://reviews.llvm.org/D111135