[llvm-bugs] [Bug 45467] New: vfmaq_lane_f16 generates a dup
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Apr 7 14:02:54 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=45467
Bug ID: 45467
Summary: vfmaq_lane_f16 generates a dup
Product: tools
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: opt
Assignee: unassignedbugs at nondot.org
Reporter: fbarchard at google.com
CC: llvm-bugs at lists.llvm.org
clang on aarch64 (android) generates a dup for vfmaq_lane_f16
For the function xnn_f16_gemm_ukernel_4x8__neonfp16arith_ld64
This is the inner loop
ldr d4, [x21],#8
ldr d5, [x19],#8
ldr d6, [x22],#8
ldr d7, [x7],#8
ldp q16, q17, [x4]
dup v18.8h, v4.h[0]
sub x23, x23, #0x8
cmp x23, #0x7
fmla v0.8h, v18.8h, v16.8h
dup v18.8h, v5.h[0]
fmla v3.8h, v18.8h, v16.8h
dup v18.8h, v6.h[0]
fmla v2.8h, v18.8h, v16.8h
dup v18.8h, v7.h[0]
fmla v1.8h, v18.8h, v16.8h
dup v16.8h, v4.h[1]
dup v18.8h, v5.h[1]
fmla v0.8h, v16.8h, v17.8h
dup v16.8h, v6.h[1]
fmla v3.8h, v18.8h, v17.8h
dup v18.8h, v7.h[1]
fmla v2.8h, v16.8h, v17.8h
fmla v1.8h, v18.8h, v17.8h
ldp q16, q17, [x4,#32]
dup v18.8h, v4.h[2]
dup v4.8h, v4.h[3]
add x4, x4, #0x40
fmla v0.8h, v18.8h, v16.8h
dup v18.8h, v5.h[2]
fmla v3.8h, v18.8h, v16.8h
dup v18.8h, v6.h[2]
fmla v2.8h, v18.8h, v16.8h
dup v18.8h, v7.h[2]
fmla v1.8h, v18.8h, v16.8h
dup v5.8h, v5.h[3]
dup v6.8h, v6.h[3]
dup v7.8h, v7.h[3]
fmla v0.8h, v4.8h, v17.8h
fmla v3.8h, v5.8h, v17.8h
fmla v2.8h, v6.8h, v17.8h
fmla v1.8h, v7.8h, v17.8h
b.hi 2eb70 <xnn_f16_gemm_ukernel_4x8__neonfp16arith_ld64+0x9c>
Instead of
dup v18.8h, v5.h[0]
fmla v3.8h, v18.8h, v16.8h
the compiler could generate
fmla v3.8h, v16.8h, v5.h[0]
This is a similar (not identical) kernel:
xnn_f16_gemm_ukernel_4x16__aarch64_neonfp16arith_ld32
but written in assembly
ldr s0, [x3],#4
ldp q20, q21, [x5],#32
ldr s1, [x11],#4
ldr s2, [x12],#4
ldr s3, [x4],#4
fmla v16.8h, v20.8h, v0.h[0]
fmla v17.8h, v21.8h, v0.h[0]
fmla v18.8h, v20.8h, v1.h[0]
fmla v19.8h, v21.8h, v1.h[0]
ldp q22, q23, [x5],#32
fmla v28.8h, v20.8h, v2.h[0]
fmla v29.8h, v21.8h, v2.h[0]
fmla v30.8h, v20.8h, v3.h[0]
fmla v31.8h, v21.8h, v3.h[0]
fmla v16.8h, v22.8h, v0.h[1]
fmla v17.8h, v23.8h, v0.h[1]
fmla v18.8h, v22.8h, v1.h[1]
fmla v19.8h, v23.8h, v1.h[1]
fmla v28.8h, v22.8h, v2.h[1]
fmla v29.8h, v23.8h, v2.h[1]
subs x0, x0, #0x4
fmla v30.8h, v22.8h, v3.h[1]
fmla v31.8h, v23.8h, v3.h[1]
b.cs 2f704 <xnn_f16_gemm_ukernel_4x16__aarch64_neonfp16arith_ld32+0x64>
benchmarking the functions on a Pixel 4 (Cortex A76), the intrinsics version is
1.89 times slower
f16_gemm_4x16__aarch64_neonfp16arith_ld32 6618948
f16_gemm_4x8__neonfp16arith_ld64 12543422
clang --version
Android (5900059 based on r365631c) clang version 9.0.8
(https://android.googlesource.com/toolchain/llvm-project
207d7abc1a2abf3ef8d4301736d6a7ebc224a290) (based on LLVM 9.0.8svn)
See also
https://github.com/google/XNNPACK/blob/master/src/f16-gemm/gen/4x8-neonfp16arith-ld64.c
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200407/13d89e77/attachment.html>
More information about the llvm-bugs
mailing list