[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86

Tue Jun 17 09:08:06 PDT 2008

Hi Nate!

I don't see how that would work. Select doesn't work per element.

Say we're trying to vectorize the following C++ code:

if(v[0] < 0) v[0] += 1.0f;

if(v[1] < 0) v[1] += 1.0f;

if(v[2] < 0) v[2] += 1.0f;

if(v[3] < 0) v[3] += 1.0f;

With SSE assembly this would be as simple as:

movaps xmm1, xmm0   // v in xmm0

cmpltps xmm1, zero       // zero = {0.0f, 0.0f, 0.0f, 0.0f}

andps xmm1, one            // one = {1.0f, 1.0f, 1.0f, 1.0f} 

addps xmm0, xmm1

With the current definition of VFCmp this seems hard if not impossible to
achieve. Vector compare instructions that  return all 1's or all 0's per
element are very common, and they are quite powerful in my opinion
(effectively allowing to implement a per-element Select). It seems to me
that for the few architectures that don't have such instructions it would be
useful to have LLVM generate a short sequence of instructions that does
result in useful masks of all 1's or all 0's. Or am I missing something?

Thanks a lot,

Nicolas

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Nate Begeman
Sent: Monday, 16 June, 2008 22:43
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86

On Jun 13, 2008, at 12:27 AM, Nicolas Capens wrote:

Hi all,

When trying to generate a VFCmp instruction when UnsafeFPMath is set to true
I get an assert "Unexpected CondCode" on my x86 system. This also happens
with UnsafeFPMath set to false and using an unordered compare. Could someone
look into this?

While I'm at it, is there any reason why only the most significant bit of
the return value of VFCmp is defined (according to the documentation)? Both
AltiVec and SSE set the components of the result to either all 1's or all
0's. Having only the most significant bit doesn't seem useful to me at all,
and (arithmetic) shifting vectors to replicate the bit isn't supported.

There are other architectures which don't do this, so defining it as such
would over-constrain the problem.  The bits are undefined, and may be set to
any value by the target arch.   The goal here is that you can essentially
treat each element of the vector as a signed integer and select (or other
operation) if the value is less than zero, rather than specifically equal to
-1.  This matches things like SSE's blend and PPC's fsel.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080617/fdd316ee/attachment.html>