[llvm-dev] [RFC] Should -ffast-math affect intrinsics?
Smith, Kevin B via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 12 14:28:25 PDT 2021
I've got the following little program that illustrates what I think is a problem. This is for X86/Intel64 intrinsics.
If compiled using
$ clang -O2 intrin_prob.c
$ a.out
2.000000, 3.000000
This is the expected result. But if compiled using
$ clang -O2 -ffast-math intrin_prob.c
$ a.out
1.500000, 3.255000
This gets incorrect results, because reassociation happens across the calls to the _mm_add_pd, and _mm_sub_pd intrinsics
and the value that should have been added and subtracted gets constant folded to zero. It seems to me that the fast-math
flags really should not affect intrinsics implementations themselves, and that the fast-math flags should allow reassociation
across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me.
I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags
affect intrinsics, at least not for reassociation across the call boundaries and I haven't checked the Microsoft compiler yet.
An easy "fix" would be to add
#pragma float_control(precise, on)
or
#pragma clang fp reassociate(off)
near the top of immintrin.h to cause all intrinsics to ignore all fast-math flags, or at least ignore reassociation.
$ cat intrin_prob.c
#include <immintrin.h>
#include <stdio.h>
static union {
double u1[2];
__m128d u2;
} t1[1] = {1.25, 3.25};
int main(int argc, char **argv) {
__m128d t2;
__m128d t3;
// This is just so the compiler cannot constant fold
// and know the values of t1.
t1[0].u1[0] += argc * 0.25;
t1[0].u1[1] += argc * .005;
// This value when added, then subtracted should cause
// the values to be truncated to integer. If the compiler
// optimizes the add and subtract out by doing
// reassociation, then the printed values will have
// fractional parts. If the compiler does the intrinsics
// as expected, then the values printed will have no fractional part.
t2 = _mm_castsi128_pd(_mm_set_epi32((int)((0x4338000000000000uLL) >> 32),
(int)((0x4338000000000000uLL) >> 0),
(int)((0x4338000000000000uLL) >> 32),
(int)((0x4338000000000000uLL) >> 0)));
t3 = _mm_add_pd(t1[0].u2, t2);
t3 = _mm_sub_pd(t3, t2);
t1[0].u2 = t3;
printf("%f, %f\n", t1[0].u1[0], t1[0].u1[1]);
return 0;
}
More information about the llvm-dev
mailing list