[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86

Nate Begeman natebegeman at mac.com
Tue Jun 17 10:43:07 PDT 2008


On Jun 17, 2008, at 9:08 AM, Nicolas Capens wrote:

> Hi Nate!
>
> I don’t see how that would work. Select doesn’t work per element.
>
> Say we’re trying to vectorize the following C++ code:
>
> if(v[0] < 0) v[0] += 1.0f;
> if(v[1] < 0) v[1] += 1.0f;
> if(v[2] < 0) v[2] += 1.0f;
> if(v[3] < 0) v[3] += 1.0f;
>
> With SSE assembly this would be as simple as:
>
> movaps xmm1, xmm0   // v in xmm0
> cmpltps xmm1, zero       // zero = {0.0f, 0.0f, 0.0f, 0.0f}
> andps xmm1, one            // one = {1.0f, 1.0f, 1.0f, 1.0f}
> addps xmm0, xmm1
>
> With the current definition of VFCmp this seems hard if not  
> impossible to achieve. Vector compare instructions that  return all  
> 1’s or all 0’s per element are very common, and they are quite  
> powerful in my opinion (effectively allowing to implement a per- 
> element Select). It seems to me that for the few architectures that  
> don’t have such instructions it would be useful to have LLVM  
> generate a short sequence of instructions that does result in useful  
> masks of all 1’s or all 0’s. Or am I missing something?

I think you're missing a target flag that says "vfcmp sets all bits".   
Clearly if you are implementing a bit-select in terms of and/andn/or  
on SSE2, you want all the bits to be set, but there are also useful  
things like blend and maskmov which look only at the top bit.  I'm  
pretty sure we'll get to the code you want without constraining vfcmp  
to set all bits.

Nate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080617/dd11e70d/attachment.html>


More information about the llvm-dev mailing list