[llvm-bugs] [Bug 27222] New: Inefficient code for fp16 vectors
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Apr 5 11:25:29 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=27222
Bug ID: 27222
Summary: Inefficient code for fp16 vectors
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: pirama at google.com
CC: llvm-bugs at lists.llvm.org, srhines at google.com
Classification: Unclassified
We generate inefficient code for half vectors for some architectures. Consider
the following IR:
define void @add_h(<4 x half>* %a, <4 x half>* %b) {
entry:
%x = load <4 x half>, <4 x half>* %a, align 8
%y = load <4 x half>, <4 x half>* %b, align 8
%0 = fadd <4 x half> %x, %y
store <4 x half> %0, <4 x half>* %a
ret void
}
LLVM currently splits and scalarizes vectors. IOW, it splits the <4 x half>
into 4 half datum and operates individually on them. This prevents the backend
from selecting vector load and vector conversion instructions. The code
generated has repeated 16-byte loads, converstion to fp32, addition, conversion
back to fp16 and a 16-byte store.
Here's the code generated for ARM32:
ldrh r4, [r1, #6]
ldrh r3, [r0, #6]
ldrh r12, [r1]
ldrh r2, [r0, #4]
ldrh lr, [r0, #2]
vmov s0, r4
ldrh r4, [r1, #2]
ldrh r1, [r1, #4]
vmov s2, r3
ldrh r3, [r0]
vmov s6, r2
vmov s10, lr
vmov s12, r12
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s2, s2
vadd.f32 s0, s2, s0
vmov s4, r1
vmov s8, r4
vmov s14, r3
vcvtb.f32.f16 s4, s4
vcvtb.f32.f16 s6, s6
vcvtb.f32.f16 s2, s8
vcvtb.f32.f16 s8, s10
vcvtb.f32.f16 s10, s12
vcvtb.f32.f16 s12, s14
vcvtb.f16.f32 s0, s0
vadd.f32 s4, s6, s4
vadd.f32 s2, s8, s2
vadd.f32 s6, s12, s10
vmov r1, s0
vcvtb.f16.f32 s4, s4
vcvtb.f16.f32 s0, s2
vcvtb.f16.f32 s2, s6
strh r1, [r0, #6]
vmov r1, s4
strh r1, [r0, #4]
vmov r1, s0
strh r1, [r0, #2]
vmov r1, s2
strh r1, [r0]
In comparison, the same code gets translated to the following for AArch64:
ldr d0, [x1]
ldr d1, [x0]
fcvtl v0.4s, v0.4h
fcvtl v1.4s, v1.4h
fadd v0.4s, v1.4s, v0.4s
fcvtn v0.4h, v0.4s
str d0, [x0]
ret
.Lfunc_end0:
This happens for the architectures whose LLVM backends don't natively support
half (such as x86, x86_64 and ARM32).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160405/91f05c6c/attachment.html>
More information about the llvm-bugs
mailing list