[PATCH] D104042: [AArch64] Improve SAD pattern

Thu Jun 10 09:26:09 PDT 2021

jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, SjoerdMeijer, fhahn, t.p.northover.
Herald added subscribers: danielkiss, hiraditya, kristof.beyls.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Given a vecreduce_add node, detect the below pattern and convert it to the node sequence with UABDL, UADB and UADDLP.

  i32 vecreduce_add(
   v16i32 abs(
     v16i32 sub(
       v16i32 zero_extend(v16i8 a), v16i32 zero_extend(v16i8 b))))

  i32 vecreduce_add(
    v4i32 UADDLP(
      v8i16 add(
        v8i16 zext(
          v8i8 UABD low8:v16i8 a, low8:v16i8 b
        v8i16 zext(
          v8i8 UABD high8:v16i8 a, high8:v16i8 b

For example, the updated pattern improves the assembly output as below.
The source llvm IR

  define i32 @test_sad_v16i8(i8* nocapture readonly %a, i8* nocapture readonly %b) {
  entry:
    %0 = bitcast i8* %a to <16 x i8>*
    %1 = load <16 x i8>, <16 x i8>* %0
    %2 = zext <16 x i8> %1 to <16 x i32>
    %3 = bitcast i8* %b to <16 x i8>*
    %4 = load <16 x i8>, <16 x i8>* %3
    %5 = zext <16 x i8> %4 to <16 x i32>
    %6 = sub nsw <16 x i32> %5, %2
    %7 = call <16 x i32> @llvm.abs.v16i32(<16 x i32> %6, i1 true)
    %8 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %7) 
    ret i32 %8
  }

The assembly output from original pattern.

  	ldr	q0, [x0]
  	ldr	q1, [x1]
  	uabd	v0.16b, v1.16b, v0.16b
  	ushll2	v1.8h, v0.16b, #0
  	ushll	v0.8h, v0.8b, #0
  	uaddl2	v2.4s, v0.8h, v1.8h
  	uaddl	v0.4s, v0.4h, v1.4h
  	add	v0.4s, v0.4s, v2.4s
  	addv	s0, v0.4s
  	fmov	w0, s0
  	ret

The assembly output from updated pattern.

  	ldr	q0, [x0]
  	ldr	q1, [x1]
  	uabdl	v2.8h, v1.8b, v0.8b
  	uabal2	v2.8h, v1.16b, v0.16b
  	uaddlp	v0.4s, v2.8h
  	addv	s0, v0.4s
  	fmov	w0, s0
  	ret

https://reviews.llvm.org/D104042

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrInfo.td
  llvm/test/CodeGen/AArch64/neon-sad.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D104042.351194.patch
Type: text/x-patch
Size: 8948 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210610/8ffd5e22/attachment-0001.bin>