[PATCH] D93132: [SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter

Mon Dec 14 01:20:03 PST 2020

david-arm added a comment.

Hi @kmclaughlin, I've just got one function left to review, but your patch looks good so far with lots of thorough testing! I just had a few minor comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3981
+    SDValue Ops[] = {Chain, Mask, BasePtr, Index, InputVT, PassThru};
+    if (ResNeedsSignExtend)
+      Opcode = getSignExtendedGatherOpcode(Opcode);
----------------
Could this be hoisted out to just below the definition of 'Opcode' on line 3976? This way you might be able to avoid duplicating the code below as well on line 3986?

================
Comment at: llvm/test/CodeGen/AArch64/sve-masked-gather.ll:4
+
+define <vscale x 2 x i64> @masked_gather_nxv2i8(<vscale x 2 x i8*> %ptrs, i64 %offset, <vscale x 2 x i1> %mask) {
+; CHECK-LABEL: masked_gather_nxv2i8:
----------------
In this function and others in the file it doesn't look like %offset is actually used - can this be removed?

================
Comment at: llvm/test/CodeGen/AArch64/sve-masked-gather.ll:43
+
+define <vscale x 2 x half> @masked_gather_nxv2f16(<vscale x 2 x half*> %ptrs, i64 %offset, <vscale x 2 x i1> %mask) {
+; CHECK-LABEL: masked_gather_nxv2f16:
----------------
Don't know if it matters or not, but the scatter equivalents also have tests for bfloat?

================
Comment at: llvm/test/CodeGen/AArch64/sve-masked-scatter-legalise.ll:15
+; Code generate store of an illegal datatype via promotion.
+define void @masked_scatter_nxv2i16(<vscale x 2 x i16> %data, i16* %base, <vscale x 2 x i16*> %ptrs, <vscale x 2 x i1> %mask) {
+; CHECK-LABEL: masked_scatter_nxv2i16:
----------------
Do we need to worry about stuff like

masked_scatter_nxv2half
masked_scatter_nxv2float?

================
Comment at: llvm/test/CodeGen/AArch64/sve-masked-scatter-vec-plus-imm.ll:79
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov x8, xzr
+; CHECK-NEXT:    add z1.d, z1.d, #32 // =0x20
----------------
I think this looks fine, but I was expecting something like:

mov x8, #32
st1b { z0.d }, p0, [x8, z1.d]

Perhaps there is a technical reason why we add to the z1 register? Anyway, it's only a minor codegen issue and we can look at later!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93132/new/

https://reviews.llvm.org/D93132