[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
Eli Friedman
eli.friedman at gmail.com
Tue Jun 17 10:30:44 PDT 2008
On Tue, Jun 17, 2008 at 9:08 AM, Nicolas Capens <nicolas at capens.net> wrote:
> Say we're trying to vectorize the following C++ code:
>
> if(v[0] < 0) v[0] += 1.0f;
> if(v[1] < 0) v[1] += 1.0f;
> if(v[2] < 0) v[2] += 1.0f;
> if(v[3] < 0) v[3] += 1.0f;
>
> With SSE assembly this would be as simple as:
>
> movaps xmm1, xmm0 // v in xmm0
> cmpltps xmm1, zero // zero = {0.0f, 0.0f, 0.0f, 0.0f}
> andps xmm1, one // one = {1.0f, 1.0f, 1.0f, 1.0f}
> addps xmm0, xmm1
>
> With the current definition of VFCmp this seems hard if not impossible to
> achieve. Vector compare instructions that return all 1's or all 0's per
> element are very common, and they are quite powerful in my opinion
> (effectively allowing to implement a per-element Select). It seems to me
> that for the few architectures that don't have such instructions it would be
> useful to have LLVM generate a short sequence of instructions that does
> result in useful masks of all 1's or all 0's. Or am I missing something?
Well, this is a bit of a hack, but one way to write it would be
something like the following:
define <4 x float> @a2(<4 x float> %in) nounwind {
%cmpres = vfcmp olt <4 x float> %in, zeroinitializer
%cmpresshift = udiv <4 x i32> %cmpres, <i32 2147483648, i32
2147483648, i32 2147483648, i32 2147483648>
%cmpmask = sub <4 x i32> zeroinitializer, %cmpresshift
%andmask = and <4 x i32> %cmpmask, bitcast (<4 x float> <float 1.0,
float 1.0,float 1.0,float 1.0> to <4 x i32>)
%andmask2 = bitcast <4 x i32> %andmask to <4 x float>
%result = add <4 x float> %in, %andmask2
ret <4 x float> %result
}
And ideally, that would optimize down to the code as you wrote it.
It'll take some tweaks to codegen to get that working efficiently,
though: first, the div+neg doesn't get opimized to an arithmetic
shift, and second, the combiner can't tell that the shift is a noop.
-Eli
More information about the llvm-dev
mailing list