[llvm-bugs] [Bug 24475] New: [x86] fmax/fmin calls should be optimized and inlined

Mon Aug 17 08:45:46 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=24475

            Bug ID: 24475
           Summary: [x86] fmax/fmin calls should be optimized and inlined
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: spatel+llvm at rotateright.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

This is a follow-on to bug 24314 (optimizing fmax/fmin with relaxed FP
constraints) and was originally suggested by Hal Finkel in
http://reviews.llvm.org/D11866.

We should always be able to inline an optimized version of fmax/fmin - even
without the help of any FP relaxation settings.

x86 doesn't have HW min/max instructions that provide the behavior specified by
IEEE or the C standard with respect to NaN operands. Some architectures like
AArch64 have instructions that provide the exact behavior, so that should be an
easy match.

In reviewing the x86 C library implementations on Linux and Mac, I think we can
do better. Both of those use branches to handle the NaN cases. Here's a
branchless implementation with as few as 3 ops (cmpunordss / blendvps / maxss). 

I think this is compliant with the standards:

#include <immintrin.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#define HAS_SSE41 0

float branchless_fmax(float f1, float f2) {
        __m128 v1 = _mm_set_ss(f1);
        __m128 v2 = _mm_set_ss(f2);

        // Is the 1st arg a NaN?
        __m128 isnan1 = _mm_cmpunord_ss(v1, v1);

#if HAS_SSE41
        // If yes, replace it with the 2nd arg.
        __m128 nonan1 = _mm_blendv_ps(v1, v2, isnan1);
#else
         // A select (variable blend) the pre-SSE41 way.
        __m128 and = _mm_and_ps(isnan1, v2);
        __m128 andn = _mm_andnot_ps(isnan1, v1);
        __m128 nonan1 = _mm_or_ps(and, andn);
#endif

        // If the 1st arg is a NaN, this is a max of the 2nd arg against
itself.
        // If the 2nd arg is a NaN, this returns the 1st (if it wasn't a NaN).
        // If both args were NaN, this returns the NaN value of the 2nd arg
because
        // that's the 2nd parameter to the max and that's how x86 HW works.
        // If neither arg is a NaN, this is what we always wanted: the max of
the args!
        __m128 maxnum = _mm_max_ss(v2, nonan1);
        float result;
        _mm_store_ss(&result, maxnum);
        return result;
}

void fmax_compare(int i1, int i2) {
        float f1,f2;
        memcpy(&f1, &i1, 4);
        memcpy(&f2, &i2, 4);
        float max1 = fmaxf(f1, f2);
        float max2 = branchless_fmax(f1, f2);

        int m1, m2;
        memcpy(&m1, &max1, 4);
        memcpy(&m2, &max2, 4);
        if (m1 != m2)
                printf("%x %x: fmax = %f (%x), my_fmax = %f (%x)\n",
                        i1, i2, max1, m1, max2, m2);
}

int main() {
        int i1,i2;

        // both nan
        i1 = 0x7f800001;
        i2 = 0x7f800002;
        fmax_compare(i1, i2);
        fmax_compare(i2, i1);

        // both numbers
        i1 = 0x3f8f0000;
        i2 = 0x3f800000;
        fmax_compare(i1, i2);
        fmax_compare(i2, i1);

        // nan, number
        i1 = 0x7f800001;
        i2 = 0x3f800000;
        fmax_compare(i1, i2);
        // number, nan
        fmax_compare(i2, i1);

        return 0;
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150817/9c01425c/attachment-0001.html>