[LLVMbugs] [Bug 6137] New: unnecessary moves between XMM and general registers by x86 back end

Mon Jan 25 08:37:31 PST 2010

http://llvm.org/bugs/show_bug.cgi?id=6137

           Summary: unnecessary moves between XMM and general registers by
                    x86 back end
           Product: new-bugs
           Version: trunk
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: brian.sumner at amd.com
                CC: llvmbugs at cs.uiuc.edu

For the following C code:

---- example.c ----------------------
typedef unsigned int uint;
static inline float as_float(uint u) { union { float f; uint u; } v; v.u = u;
return v.f; }
static inline uint as_uint(float f) { union { float f; uint u; } v; v.f = f;
return v.u; }

static float
r(float x)
{
    uint u = as_uint(x);
    uint a = u & 0x7fffffffU;
    float v = (as_float(a) + 0x1.0p+23F) - 0x1.0p+23F;
    return a > 0x4affffffU ? x : as_float(as_uint(v) | (u ^ a));
}
---------------------------------------
The x86 back end is generating several unnecessary MOVD instructions to move
data between XMM and general purpose registers.  These moves are unnecessary
because the argument arrives in an XMM register, the result returns in an XMM
register, and all the operations performed can be done with XMM register
operations.

I would like the x86 back end to generate the same code for the function above
as it does for the following function which performs the same operation
entirely in XMM registers:
---- recoded-example.c -------------------------
#include <xmmintrin.h>
static inline __m128i as_m128i(__m128 f) { union { __m128i i; __m128 f; } v;
v.f=f; return v.i; }
static inline __m128 as_m128(__m128i i) { union { __m128i i; __m128 f; } v;
v.i=i; return v.f; }

static float
r(float xx)
{
    __m128 t = _mm_set_ss(0x1.0p+23F);
    __m128 x = _mm_set_ss(xx);
    __m128i a = _mm_and_si128(as_m128i(x), _mm_set1_epi32(0x7fffffff));
    __m128 v = _mm_sub_ss(_mm_add_ss(as_m128(a), t), t);
    a = _mm_xor_si128(as_m128i(x), a);
    v = as_m128(_mm_or_si128(as_m128i(v), a));
    __m128i m = _mm_cmpgt_epi32(a, _mm_set1_epi32(0x4affffff));
    v = as_m128(_mm_andnot_si128(m, as_m128i(v)));
    x = as_m128(_mm_and_si128(m, as_m128i(x)));
    x = as_m128(_mm_or_si128(as_m128i(x), as_m128i(v)));
    return _mm_cvtss_f32(x);
}
--------------------------------------------------------

Calling one of these functions in a loop (where it is probably inlined) of 1G
iterations with various compilers on an Phenom II processor, I get:

gcc 4.4.1:   orig:  7.99  new:  7.32
icc 11.0:    orig:  5.32  new: 20.50
clang trunk: orig: 11.53  new:  6.82

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.