[PATCH] D104042: [AArch64] Improve SAD pattern
JinGu Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 10 09:26:09 PDT 2021
jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, SjoerdMeijer, fhahn, t.p.northover.
Herald added subscribers: danielkiss, hiraditya, kristof.beyls.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
Given a vecreduce_add node, detect the below pattern and convert it to the node sequence with UABDL, UADB and UADDLP.
i32 vecreduce_add(
v16i32 abs(
v16i32 sub(
v16i32 zero_extend(v16i8 a), v16i32 zero_extend(v16i8 b))))
i32 vecreduce_add(
v4i32 UADDLP(
v8i16 add(
v8i16 zext(
v8i8 UABD low8:v16i8 a, low8:v16i8 b
v8i16 zext(
v8i8 UABD high8:v16i8 a, high8:v16i8 b
For example, the updated pattern improves the assembly output as below.
The source llvm IR
define i32 @test_sad_v16i8(i8* nocapture readonly %a, i8* nocapture readonly %b) {
entry:
%0 = bitcast i8* %a to <16 x i8>*
%1 = load <16 x i8>, <16 x i8>* %0
%2 = zext <16 x i8> %1 to <16 x i32>
%3 = bitcast i8* %b to <16 x i8>*
%4 = load <16 x i8>, <16 x i8>* %3
%5 = zext <16 x i8> %4 to <16 x i32>
%6 = sub nsw <16 x i32> %5, %2
%7 = call <16 x i32> @llvm.abs.v16i32(<16 x i32> %6, i1 true)
%8 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %7)
ret i32 %8
}
The assembly output from original pattern.
ldr q0, [x0]
ldr q1, [x1]
uabd v0.16b, v1.16b, v0.16b
ushll2 v1.8h, v0.16b, #0
ushll v0.8h, v0.8b, #0
uaddl2 v2.4s, v0.8h, v1.8h
uaddl v0.4s, v0.4h, v1.4h
add v0.4s, v0.4s, v2.4s
addv s0, v0.4s
fmov w0, s0
ret
The assembly output from updated pattern.
ldr q0, [x0]
ldr q1, [x1]
uabdl v2.8h, v1.8b, v0.8b
uabal2 v2.8h, v1.16b, v0.16b
uaddlp v0.4s, v2.8h
addv s0, v0.4s
fmov w0, s0
ret
https://reviews.llvm.org/D104042
Files:
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AArch64/AArch64InstrInfo.td
llvm/test/CodeGen/AArch64/neon-sad.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D104042.351194.patch
Type: text/x-patch
Size: 8948 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210610/8ffd5e22/attachment-0001.bin>
More information about the llvm-commits
mailing list