[LLVMbugs] [Bug 7965] New: No way to do a vector [reciprocal] square root
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sun Aug 22 04:17:36 PDT 2010
http://llvm.org/bugs/show_bug.cgi?id=7965
Summary: No way to do a vector [reciprocal] square root
Product: clang
Version: 2.7
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: -New Bugs
AssignedTo: unassignedclangbugs at nondot.org
ReportedBy: baggett.patrick at gmail.com
CC: llvmbugs at cs.uiuc.edu
I've been working on a raytracer (heavy use of vectors) and I'd like to
experiment with dynamic code generation using LLVM. Right now, I'm stuck on
trying to do a square root of a 4-valued vector efficiently, though other
applications might want general N-valued vectors.
On x86 targets, the SQRTPS instruction computes the square root of four FP
values at once. It is currently impossible to generate this instruction (and
related ones such as RCPPS and RSQRTPS) using vector extensions alone. This is
a major killing point for me.
I'd like to be able to use them with resorting to ugly intrinsics which aren't
portable. Given that this is an extremely common operation (read: not just
x86), it would be nice if it was supported.
Ideally, __builtin_sqrtvector(), __builtin_rsqrtvector(), and
__builtin_rcpvector() for floating point vectors only, where the last two
compute the reciprocal square root estimate and reciprocal estimate
respectively. Described as having "implementation-dependent" precision.
My understanding of the LLVM architecture is that something like this requires
clang support and LLVM support.
I'm guessing you'd need a vector instruction at the LLVM ISA level to support
this, but considering that clang converted sqrtf() -> SQRTSS instruction, that
may not be true. I've just started with LLVM, so pardon my ignorance of its
backends. :\
Simple case to reproduce both optimal and non-optimal code (x64):
----------------
typedef float float4 __attribute__((ext_vector_type(4)));
#include <math.h>
float4 sqrt4(float4 value)
{
value.x = sqrtf(value.x);
value.y = sqrtf(value.y);
value.z = sqrtf(value.z);
value.w = sqrtf(value.w);
return value;
}
#include <xmmintrin.h>
float4 sqrt4_sse(float4 value)
{
return _mm_sqrt_ps(value);
}
-------------------------------
Output ASM (x86-64)
-------------------------------
sqr4:
pshufd $3, %xmm0, %xmm1
pshufd $1, %xmm0, %xmm2
sqrtss %xmm1, %xmm1
sqrtss %xmm2, %xmm2
unpcklps %xmm1, %xmm2
sqrtss %xmm0, %xmm1
movhlps %xmm0, %xmm0
sqrtss %xmm0, %xmm0
unpcklps %xmm0, %xmm1
movaps %xmm1, %xmm0
unpcklps %xmm2, %xmm0
ret
sqrt4_sse:
sqrtps %xmm0, %xmm0
ret
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list