<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [X86] Vectorize scalar conversions to avoid fpu-gpr-fpu transfers"

   href="https://bugs.llvm.org/show_bug.cgi?id=39974">39974</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86] Vectorize scalar conversions to avoid fpu-gpr-fpu transfers

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>llvm-dev@redking.me.uk

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>a.bataev@hotmail.com, andrea.dibiagio@gmail.com, craig.topper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>As mentioned on <a href="https://reviews.llvm.org/D55558">https://reviews.llvm.org/D55558</a>

define float @cvt(<4 x i32> %a0) nounwind {

  %1 = extractelement <4 x i32> %a0, i32 1

  %2 = sitofp i32 %1 to float

  ret float %2

}

define float @cvt_alt(<4 x i32> %a0) nounwind {

  %1 = shufflevector <4 x i32> %a0, <4 x i32> undef, <4 x i32> <i32 1, i32 1,

i32 1, i32 1>

  %2 = sitofp <4 x i32> %1 to <4 x float>

  %3 = extractelement <4 x float> %2, i32 0

  ret float %3

}

If a scalar conversion can be performed purely on the vector unit, its

typically faster and avoids fpu-gpr-fpu register transfer bottlenecks.

<a href="https://godbolt.org/z/KCm1Pk">https://godbolt.org/z/KCm1Pk</a>

I'm not sure if this is best performed in the backend or whether the SLP should

be considered, IIRC we've had similar discussions in the past about scalar i64

math being done on i686 SSE2 targets.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>