<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [x86, SSE] recognize min/max FP patterns"

   href="https://llvm.org/bugs/show_bug.cgi?id=28001">28001</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[x86, SSE] recognize min/max FP patterns

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>spatel+llvm@rotateright.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This is the FP version of <a href="http://reviews.llvm.org/D20774">http://reviews.llvm.org/D20774</a> , but we need a

different solution. 

The problem in this case is not the bitcasts. It's either that the x86

intrinsic is interfering with IR pattern recognition or that the backend needs

to recognize this as a select of 'fcmp oge'.

define <4 x i32> @gibsonfp(<4 x float> %a, <4 x float> %b) {

  %bc1 = bitcast <4 x float> %a to <4 x i32>

  %bc2 = bitcast <4 x float> %b to <4 x i32>

  %cmpps = tail call <4 x float> @llvm.x86.sse.cmp.ps(<4 x float> %b, <4 x

float> %a, i8 1)

  %cmpbc = bitcast <4 x float> %cmpps to <4 x i32>

  %neg = xor <4 x i32> %cmpbc, <i32 -1, i32 -1, i32 -1, i32 -1>

  %and1 = and <4 x i32> %cmpbc, %bc1

  %and2 = and <4 x i32> %neg, %bc2

  %or = or <4 x i32> %and1, %and2

  ret <4 x i32> %or

}

So instead of:

_gibsonfp: 

    movaps    %xmm1, %xmm2

    cmpltps    %xmm0, %xmm2

    andps    %xmm2, %xmm0

    andnps    %xmm1, %xmm2

    orps    %xmm2, %xmm0

    retq

We should be able to produce:

    maxps    %xmm1, %xmm0

    retq

Note:

1. The FP min/max instructions have been around since SSE1, so this can apply

to the vast majority of subtargets.

2. This does not require fast-math in the general case, but there may be cases

that simplify further if we have some kind of fast-math decoration on the

compare intrinsic.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>