[llvm-bugs] [Bug 39974] New: [X86] Vectorize scalar conversions to avoid fpu-gpr-fpu transfers

Wed Dec 12 05:02:45 PST 2018

https://bugs.llvm.org/show_bug.cgi?id=39974

            Bug ID: 39974
           Summary: [X86] Vectorize scalar conversions to avoid
                    fpu-gpr-fpu transfers
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: llvm-dev at redking.me.uk
                CC: a.bataev at hotmail.com, andrea.dibiagio at gmail.com,
                    craig.topper at gmail.com, lebedev.ri at gmail.com,
                    llvm-bugs at lists.llvm.org, llvm-dev at redking.me.uk,
                    spatel+llvm at rotateright.com

As mentioned on https://reviews.llvm.org/D55558

define float @cvt(<4 x i32> %a0) nounwind {
  %1 = extractelement <4 x i32> %a0, i32 1
  %2 = sitofp i32 %1 to float
  ret float %2
}

define float @cvt_alt(<4 x i32> %a0) nounwind {
  %1 = shufflevector <4 x i32> %a0, <4 x i32> undef, <4 x i32> <i32 1, i32 1,
i32 1, i32 1>
  %2 = sitofp <4 x i32> %1 to <4 x float>
  %3 = extractelement <4 x float> %2, i32 0
  ret float %3
}

If a scalar conversion can be performed purely on the vector unit, its
typically faster and avoids fpu-gpr-fpu register transfer bottlenecks.

https://godbolt.org/z/KCm1Pk

I'm not sure if this is best performed in the backend or whether the SLP should
be considered, IIRC we've had similar discussions in the past about scalar i64
math being done on i686 SSE2 targets.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181212/8c2f1f15/attachment.html>