[llvm-bugs] [Bug 40359] New: [X86][SSE] Memory fold scalar unary ops with zero register passthrough

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Jan 17 09:09:28 PST 2019


https://bugs.llvm.org/show_bug.cgi?id=40359

            Bug ID: 40359
           Summary: [X86][SSE] Memory fold scalar unary ops with zero
                    register passthrough
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: llvm-dev at redking.me.uk
                CC: andrea.dibiagio at gmail.com, craig.topper at gmail.com,
                    llvm-bugs at lists.llvm.org, llvm-dev at redking.me.uk,
                    spatel+llvm at rotateright.com

https://godbolt.org/z/uPs8BV

We currently do this:

  vmovss (%rdi), %xmm0 # xmm0 = mem[0],zero,zero,zero
  vmovss (%rsi), %xmm1 # xmm1 = mem[0],zero,zero,zero
  vsqrtss %xmm0, %xmm0, %xmm0
  vsqrtss %xmm1, %xmm1, %xmm1
  vaddss %xmm1, %xmm0, %xmm0
  retq

but we can reduce register pressure when we have multiple uses of the zero by
doing this instead, and even when we don't reuse the register there's no
regression.

  vxorps %xmm1, %xmm1, %xmm1
  vsqrtss (%rdi), %xmm1, %xmm0
  vsqrtss (%rsi), %xmm1, %xmm1
  vaddss %xmm1, %xmm0, %xmm0
  retq

This is really about AVX encoded instructions but I can't see any reason not to
do this on older SSE instructions as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190117/ad318fc2/attachment.html>


More information about the llvm-bugs mailing list