[PATCH] D148134: [AArch64] Replace DUP scalar by DUP element

JinGu Kang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 12 08:12:04 PDT 2023


jaykang10 created this revision.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

gcc generates less instructions than llvm from below intrinsic example.

  #include <arm_neon.h>
  
  uint8x8_t test1(uint8x8_t a) {
      return vdup_n_u8(vrshrd_n_u64(vaddlv_u8(a), 3));
  }
  
  uint8x8_t test2(uint8x8_t a) {
      return vrshrn_n_u16(vdupq_n_u16(vaddlv_u8(a)), 3); 
  }
  
  gcc output
  test1:
  	uaddlv	h0, v0.8b
  	umov	w0, v0.h[0]
  	fmov	d0, x0
  	urshr	d0, d0, 3
  	dup	v0.8b, v0.b[0]
  	ret
  
  test2:
  	uaddlv	h0, v0.8b
  	dup	v0.8h, v0.h[0]
  	rshrn	v0.8b, v0.8h, 3
  	ret
  
  llvm output
  test1:                                  // @test1
  	uaddlv	h0, v0.8b
  	fmov	w8, s0
  	and	w8, w8, #0xffff
  	fmov	d0, x8
  	urshr	d0, d0, #3
  	fmov	x8, d0
  	dup	v0.8b, w8
  	ret
  
  test2:                                  // @test2
  	uaddlv	h0, v0.8b
  	fmov	w8, s0
  	dup	v0.8h, w8
  	rshrn	v0.8b, v0.8h, #3
  	ret

We can see additional `fmov` instructions on llvm output.
The `uddlv` has FPR as out register class and the `dup` has GPR as source register class. Therefore, there is `COPY` instruction for register class conversions between FPR and GPR and it is expanded to `fmov`.
There is `dup` instruction with simd register which is called dup element. If we use it, we can remove the `COPY` instruction because the FPR is shared with simd register.
With this patch, llvm generates below output.

  test1:                                  // @test1
  	uaddlv	h0, v0.8b
  	fmov	w8, s0
  	and	w8, w8, #0xffff
  	fmov	d0, x8
  	urshr	d0, d0, #3
  	dup	v0.8b, v0.b[0]
  	ret
  
  test2:                                  // @test2
  	uaddlv	h1, v0.8b
  	dup	v0.8h, v1.h[0]
  	rshrn	v0.8b, v0.8h, #3
  	ret


https://reviews.llvm.org/D148134

Files:
  llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
  llvm/test/CodeGen/AArch64/replace-dupgpr-with-duplane.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D148134.512834.patch
Type: text/x-patch
Size: 6959 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230412/3026bdf0/attachment.bin>


More information about the llvm-commits mailing list