<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86] Vectorize scalar conversions to avoid fpu-gpr-fpu transfers"
   href="https://bugs.llvm.org/show_bug.cgi?id=39974">39974</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86] Vectorize scalar conversions to avoid fpu-gpr-fpu transfers
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>llvm-dev@redking.me.uk
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>a.bataev@hotmail.com, andrea.dibiagio@gmail.com, craig.topper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>As mentioned on <a href="https://reviews.llvm.org/D55558">https://reviews.llvm.org/D55558</a>

define float @cvt(<4 x i32> %a0) nounwind {
  %1 = extractelement <4 x i32> %a0, i32 1
  %2 = sitofp i32 %1 to float
  ret float %2
}

define float @cvt_alt(<4 x i32> %a0) nounwind {
  %1 = shufflevector <4 x i32> %a0, <4 x i32> undef, <4 x i32> <i32 1, i32 1,
i32 1, i32 1>
  %2 = sitofp <4 x i32> %1 to <4 x float>
  %3 = extractelement <4 x float> %2, i32 0
  ret float %3
}

If a scalar conversion can be performed purely on the vector unit, its
typically faster and avoids fpu-gpr-fpu register transfer bottlenecks.

<a href="https://godbolt.org/z/KCm1Pk">https://godbolt.org/z/KCm1Pk</a>

I'm not sure if this is best performed in the backend or whether the SLP should
be considered, IIRC we've had similar discussions in the past about scalar i64
math being done on i686 SSE2 targets.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>