[llvm-dev] [RFC] Should -ffast-math affect intrinsics?
Smith, Kevin B via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 12 14:46:28 PDT 2021
Sorry, missed a NOT or two.
This is what I meant to say:
It seems to me that the fast-math flags really should NOT affect intrinsics implementations themselves, and that the fast-math flags should NOT allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before?
It surprised me.4
I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least it doesn't allow reassociation across the call boundaries and I haven't checked the Microsoft compiler yet.
Kevin Smith
-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Smith, Kevin B via llvm-dev
Sent: Monday, July 12, 2021 2:28 PM
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] [RFC] Should -ffast-math affect intrinsics?
I've got the following little program that illustrates what I think is a problem. This is for X86/Intel64 intrinsics.
If compiled using
$ clang -O2 intrin_prob.c
$ a.out
2.000000, 3.000000
This is the expected result. But if compiled using $ clang -O2 -ffast-math intrin_prob.c $ a.out 1.500000, 3.255000
This gets incorrect results, because reassociation happens across the calls to the _mm_add_pd, and _mm_sub_pd intrinsics and the value that should have been added and subtracted gets constant folded to zero. It seems to me that the fast-math flags really should not affect intrinsics implementations themselves, and that the fast-math flags should allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me.
I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least not for reassociation across the call boundaries and I haven't checked the Microsoft compiler yet.
An easy "fix" would be to add
#pragma float_control(precise, on)
or
#pragma clang fp reassociate(off)
near the top of immintrin.h to cause all intrinsics to ignore all fast-math flags, or at least ignore reassociation.
$ cat intrin_prob.c
#include <immintrin.h>
#include <stdio.h>
static union {
double u1[2];
__m128d u2;
} t1[1] = {1.25, 3.25};
int main(int argc, char **argv) {
__m128d t2;
__m128d t3;
// This is just so the compiler cannot constant fold
// and know the values of t1.
t1[0].u1[0] += argc * 0.25;
t1[0].u1[1] += argc * .005;
// This value when added, then subtracted should cause
// the values to be truncated to integer. If the compiler
// optimizes the add and subtract out by doing
// reassociation, then the printed values will have
// fractional parts. If the compiler does the intrinsics
// as expected, then the values printed will have no fractional part.
t2 = _mm_castsi128_pd(_mm_set_epi32((int)((0x4338000000000000uLL) >> 32),
(int)((0x4338000000000000uLL) >> 0),
(int)((0x4338000000000000uLL) >> 32),
(int)((0x4338000000000000uLL) >> 0)));
t3 = _mm_add_pd(t1[0].u2, t2);
t3 = _mm_sub_pd(t3, t2);
t1[0].u2 = t3;
printf("%f, %f\n", t1[0].u1[0], t1[0].u1[1]);
return 0;
}
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list