[llvm-bugs] [Bug 40359] New: [X86][SSE] Memory fold scalar unary ops with zero register passthrough
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Jan 17 09:09:28 PST 2019
https://bugs.llvm.org/show_bug.cgi?id=40359
Bug ID: 40359
Summary: [X86][SSE] Memory fold scalar unary ops with zero
register passthrough
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: llvm-dev at redking.me.uk
CC: andrea.dibiagio at gmail.com, craig.topper at gmail.com,
llvm-bugs at lists.llvm.org, llvm-dev at redking.me.uk,
spatel+llvm at rotateright.com
https://godbolt.org/z/uPs8BV
We currently do this:
vmovss (%rdi), %xmm0 # xmm0 = mem[0],zero,zero,zero
vmovss (%rsi), %xmm1 # xmm1 = mem[0],zero,zero,zero
vsqrtss %xmm0, %xmm0, %xmm0
vsqrtss %xmm1, %xmm1, %xmm1
vaddss %xmm1, %xmm0, %xmm0
retq
but we can reduce register pressure when we have multiple uses of the zero by
doing this instead, and even when we don't reuse the register there's no
regression.
vxorps %xmm1, %xmm1, %xmm1
vsqrtss (%rdi), %xmm1, %xmm0
vsqrtss (%rsi), %xmm1, %xmm1
vaddss %xmm1, %xmm0, %xmm0
retq
This is really about AVX encoded instructions but I can't see any reason not to
do this on older SSE instructions as well.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190117/ad318fc2/attachment.html>
More information about the llvm-bugs
mailing list