[LLVMdev] Float compare-for-equality andselect optimizationopportunity
Marc B. Reynolds
marc.reynolds at orange.fr
Tue May 27 09:48:38 PDT 2008
Hi Marc,
I'm a bit confused. Isn't the standard compare (i.e. the one for a language
like C) an ordered one? I tried converting some C code to LLVM C++ API code
with the online demo, and it uses FCMP_OEQ.
No, if you have:
x = NaN
y = NaN
then the comparison:
(x == y) is false.
Which is what your seeing from your first post and is the standard IEEE
expected behavior.
Why I expected your min/max question to be related, consider the flags of
'comiss, ucomiss, etc.' :
ZPC
unordered 111
greater than 000
less than 001
equal 100
Try the following C program with gcc, first with no options and then with
--ffinite-math-only (or --ffast-math)
-----------
#define STR(X) #X
#define CMP(X) if (X) { printf(STR(X) " true\n"); } else { printf(STR(X) "
false\n"); }
int main(int argc, char** argv)
{
float a = 0.f/0.f; // generate a NaN
float b = 1; // any finite
CMP(a!=a);
CMP(a==a);
CMP(a> b);
CMP(a>=b);
CMP(a< b);
CMP(a<=b);
CMP(a!=b);
return 0;
}
------------------------
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Marc B. Reynolds
Sent: Tuesday, 27 May, 2008 14:07
To: 'LLVM Developers Mailing List'
Subject: Re: [LLVMdev] Float compare-for-equality and select
optimizationopportunity
Both ZF and PF will be set if unordered, so the code below is IEEE
correct...you want to generate 'fcmp ueq' instead of 'fcmp oqe'
This is the resulting x86 assembly code:
movss xmm0,dword ptr [ecx+4]
ucomiss xmm0,dword ptr [ecx+8]
sete al
setnp dl
test dl,al
mov edx,edi
cmovne edx,ecx
cmovne ecx,esi
cmovne esi,edi
While I'm pleasantly surprised that my branch does get turned into several
select operations as intended (cmov - conditional move - in x86), I'm
confused why it uses the ucomiss instruction (unordered compare and set
flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the
conditional, for no clear reason. I think it could be rewritten as follows:
movss xmm0,dword ptr [ecx+4]
comiss xmm0,dword ptr [ecx+8]
mov edx,edi
cmove edx,ecx
cmove ecx,esi
cmove esi,edi
Compared to the original C syntax code this looks pretty straightforward.
Curiously, when I replace the compare-for-equality with something like a
less-than, it does generate such compact code (using comiss and cmova). And
the not-equal case looks like this:
movss xmm0,dword ptr [ecx+4]
ucomiss xmm0,dword ptr [ecx+8]
mov esi,ecx
cmove esi,edx
cmovne ecx,eax
cmove edx,eax
So this generates compact code but with an unordered compare.
Anyway, it looks like the compare-for-equality case in particular is missing
an optimization opportunity. It's no big deal to me but I thought someone
here might be interested.
Cheers,
Nicolas Capens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080527/ab905054/attachment.html>
More information about the llvm-dev
mailing list