[PATCH] D147235: [AArch64] Remove redundant `mov 0` instruction for high 64-bits

Thu Mar 30 07:25:34 PDT 2023

jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, samtebbs, efriedma.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

gcc generates less instructions than llvm from below intrinsic example.

  #include <arm_neon.h>

  float16x8_t test1(const float32x4_t a) {
      float16x4_t b = vcvt_f16_f32(a);
      return vcombine_f16(b, vdup_n_f16(0.0));
  }

  uint8x8_t test2(uint16_t *in, uint8x8_t *dst, uint8x8_t idx) {
      return vtbl1_u8(vshrn_n_u16(vld1q_u16(in), 4), idx); 
  }

  gcc output
  test1:
          fcvtn   v0.4h, v0.4s 
          fmov    d0, d0
          ret

  test2:
          ldr     q1, [x0]
          shrn    v1.8b, v1.8h, 4
          tbl     v0.8b, {v1.16b}, v0.8b 
          ret

  llvm output
  test1:                                  // @test1
          movi    d1, #0000000000000000
          fcvtn   v0.4h, v0.4s
          mov     v0.d[1], v1.d[0]
          ret

  test2:                                  // @test2
          ldr     q1, [x0]
          movi    v2.2d, #0000000000000000
          shrn    v1.8b, v1.8h, #4
          mov     v1.d[1], v2.d[0]
          tbl     v0.8b, { v1.16b }, v0.8b
          ret

The `fcvtn` and `shrn` instructions set zero for high 64-bits implicitly so we do not need `mov 0` instruction for high 64-bits. It looks gcc has patterns for the cases. For example,

  the gcc rtl pattern for test2 function's shrn
  (define_insn "aarch64_shrn<mode>_insn_le"
    [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
          (vec_concat:<VNARROWQ2>
            (truncate:<VNARROWQ>
              (lshiftrt:VQN (match_operand:VQN 1 "register_operand" "w")
                (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_<vn_mode>")))
            (match_operand:<VNARROWQ> 3 "aarch64_simd_or_scalar_imm_zero")))]
    "TARGET_SIMD && !BYTES_BIG_ENDIAN"
    "shrn\\t%0.<Vntype>, %1.<Vtype>, %2"
    [(set_attr "type" "neon_shift_imm_narrow_q")]
  )

llvm could also add tablegen patterns for them like gcc but it could be better to handle the patterns on MIR Peephole optimization pass because they have common sub patterns and considers multiple basic blocks.

With this patch, llvm generates below output.

  llvm output
  test1:                                  // @test1
          fcvtn   v0.4h, v0.4s
          ret

  test2:                                  // @test2
          ldr     q1, [x0]
          shrn    v1.8b, v1.8h, #4
          tbl     v0.8b, { v1.16b }, v0.8b
          ret

https://reviews.llvm.org/D147235

Files:
  llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
  llvm/test/CodeGen/AArch64/implicitly-set-zero-high-64-bits.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D147235.509669.patch
Type: text/x-patch
Size: 6248 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230330/125deb58/attachment.bin>