[LLVMbugs] [Bug 7965] New: No way to do a vector [reciprocal] square root

Sun Aug 22 04:17:36 PDT 2010

http://llvm.org/bugs/show_bug.cgi?id=7965

           Summary: No way to do a vector [reciprocal] square root
           Product: clang
           Version: 2.7
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: -New Bugs
        AssignedTo: unassignedclangbugs at nondot.org
        ReportedBy: baggett.patrick at gmail.com
                CC: llvmbugs at cs.uiuc.edu

I've been working on a raytracer (heavy use of vectors) and I'd like to
experiment with dynamic code generation using LLVM. Right now, I'm stuck on
trying to do a square root of a 4-valued vector efficiently, though other
applications might want general N-valued vectors.

On x86 targets, the SQRTPS instruction computes the square root of four FP
values at once. It is currently impossible to generate this instruction (and
related ones such as RCPPS and RSQRTPS) using vector extensions alone. This is
a major killing point for me.

I'd like to be able to use them with resorting to ugly intrinsics which aren't
portable. Given that this is an extremely common operation (read: not just
x86), it would be nice if it was supported.

Ideally, __builtin_sqrtvector(), __builtin_rsqrtvector(), and
__builtin_rcpvector() for floating point vectors only, where the last two
compute the reciprocal square root estimate and reciprocal estimate
respectively. Described as having "implementation-dependent" precision.

My understanding of the LLVM architecture is that something like this requires
clang support and LLVM support.

I'm guessing you'd need a vector instruction at the LLVM ISA level to support
this, but considering that clang converted sqrtf() -> SQRTSS instruction, that
may not be true. I've just started with LLVM, so pardon my ignorance of its
backends. :\

Simple case to reproduce both optimal and non-optimal code (x64):
----------------
typedef float float4 __attribute__((ext_vector_type(4)));

#include <math.h>

float4 sqrt4(float4 value)
{
    value.x = sqrtf(value.x);
    value.y = sqrtf(value.y);
    value.z = sqrtf(value.z);
    value.w = sqrtf(value.w);
    return value;
}

#include <xmmintrin.h>

float4 sqrt4_sse(float4 value)
{
    return _mm_sqrt_ps(value);
}

-------------------------------
Output ASM (x86-64)
-------------------------------
sqr4:
    pshufd    $3, %xmm0, %xmm1
    pshufd    $1, %xmm0, %xmm2
    sqrtss    %xmm1, %xmm1
    sqrtss    %xmm2, %xmm2
    unpcklps    %xmm1, %xmm2
    sqrtss    %xmm0, %xmm1
    movhlps    %xmm0, %xmm0
    sqrtss    %xmm0, %xmm0
    unpcklps    %xmm0, %xmm1
    movaps    %xmm1, %xmm0
    unpcklps    %xmm2, %xmm0
    ret

sqrt4_sse:
    sqrtps    %xmm0, %xmm0
    ret

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.