<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - use fsel to avoid branch and compare"

   href="http://llvm.org/bugs/show_bug.cgi?id=21231">21231</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>use fsel to avoid branch and compare

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: PowerPC

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>spatel+llvm@rotateright.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This testcase is derived from llvm/test/CodeGen/PowerPC/recipest.ll. 

Using -ffast-math, we can convert a sqrt intrinsic into a reciprocal square

root multiplied by its argument ( X * X ** -0.5 = X ** 0.5 )...with one

problem: we can't let a '0.0f' input turn into a 'NaN' output. 

The current PPC scalar codegen compares and branches around that:

$ cat sqrtf.ll

declare float @llvm.sqrt.f32(float)

define float @goo3(float %a) nounwind {

  %r = call float @llvm.sqrt.f32(float %a)

  ret float %r

}

$ ./llc -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math

sqrtf.ll -o -

...

.L.goo3:

# BB#0:

    addis 3, 2, .LCPI0_1@toc@ha

    lfs 0, .LCPI0_1@toc@l(3)

    fcmpu 0, 1, 0

    beq 0, .LBB0_2

# BB#1:

    frsqrtes 0, 1

    addis 3, 2, .LCPI0_0@toc@ha

    lfs 2, .LCPI0_0@toc@l(3)

    fnmsubs 3, 1, 2, 1

    fmuls 4, 0, 0

    fmadds 2, 3, 4, 2

    fmuls 0, 0, 2

    fmuls 0, 1, 0

.LBB0_2:

    fmr 1, 0

    blr

------------------------------------------------------

An 'fsel' would probably be a better choice here for performance. For the

vector PPC case, we do generate vcmpeqfp/vandc.

X86 scalar code (when enabled) will use a similar pattern to do the select:

    vrsqrtss    %xmm0, %xmm0, %xmm1

    vmulss    LCPI0_0(%rip), %xmm1, %xmm2

    vmulss    %xmm1, %xmm1, %xmm1

    vmulss    %xmm0, %xmm1, %xmm1

    vaddss    LCPI0_1(%rip), %xmm1, %xmm1

    vmulss    %xmm2, %xmm1, %xmm1

    vxorps    %xmm2, %xmm2, %xmm2

    vmulss    %xmm1, %xmm0, %xmm1

    vcmpeqss    %xmm2, %xmm0, %xmm0

    vandnps    %xmm1, %xmm0, %xmm0</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>