[llvm-dev] bug in clang?
Raghuveer Devulapalli via llvm-dev
llvm-dev at lists.llvm.org
Sat Dec 19 17:55:40 PST 2020
Hi all,
Here is a snippet of code taken from NumPy and the corresponding output
when using clang 10.0 (or even clang 12.0). Can also view the code here:
https://godbolt.org/z/Peznjx
The compiler optimizes and removes the first blendv_ps instruction (i.e
eliminates the temp1 variable all together). The goal of this instruction
was to prevent an overflow flag generated in the following multiplication
instruction. By eliminating this instruction, the resulting code causes an
unintended fp overflow flag which I think is incorrect behavior.
C code:
__m256 fma_get_mantissa(__m256 x)
{
__m256 two_power_100 =
_mm256_castsi256_ps(_mm256_set1_epi32(0x71800000));
__m256 denormal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
_CMP_LT_OQ);
__m256 normal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
_CMP_GE_OQ);
__m256 temp1 = _mm256_blendv_ps(x, _mm256_set1_ps(0.0f), normal_mask);
__m256 temp = _mm256_mul_ps(temp1, two_power_100);
x = _mm256_blendv_ps(x, temp, denormal_mask);
__m256i mantissa_bits = _mm256_set1_epi32(0x7fffff);
__m256i exp_126_bits = _mm256_set1_epi32(126 << 23);
return _mm256_castsi256_ps(
_mm256_or_si256(
_mm256_and_si256(
_mm256_castps_si256(x), mantissa_bits),
exp_126_bits));
}
object code output from clang:
.LCPI0_0:
.long 0x00800000 # float 1.17549435E-38
.LCPI0_1:
.long 0x71800000 # float 1.2676506E+30
.LCPI0_2:
.long 4294967170 # 0xffffff82
.LCPI0_3:
.long 0xc2c80000 # float -100
fma_get_exponent: # @fma_get_exponent
vbroadcastss ymm1, dword ptr [rip + .LCPI0_0] # ymm1 =
[1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
vcmpltps ymm2, ymm0, ymm1
vbroadcastss ymm3, dword ptr [rip + .LCPI0_1] # ymm3 =
[1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30]
*vmulps ymm3, ymm0, ymm3*
vcmpnleps ymm1, ymm1, ymm0
vandps ymm1, ymm1, ymm3
vblendvps ymm0, ymm0, ymm1, ymm2
vpsrld ymm0, ymm0, 23
vpbroadcastd ymm1, dword ptr [rip + .LCPI0_2] # ymm1 =
[4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170]
vpaddd ymm0, ymm0, ymm1
vcvtdq2ps ymm0, ymm0
vbroadcastss ymm1, dword ptr [rip + .LCPI0_3] # ymm1 =
[-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2]
vaddps ymm1, ymm0, ymm1
vblendvps ymm0, ymm0, ymm1, ymm2
ret
Raghuveer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201219/cf615a6e/attachment.html>
More information about the llvm-dev
mailing list