[llvm-dev] bug in clang?
Raghuveer Devulapalli via llvm-dev
llvm-dev at lists.llvm.org
Mon Dec 21 10:27:06 PST 2020
Hi Pengfei,
I did not know that, thanks for the response.
Raghuveer
P.S: I only meant the fp overflow flag.
On Mon, Dec 21, 2020 at 6:45 AM Wang, Pengfei <pengfei.wang at intel.com>
wrote:
> Hi Raghuveer,
>
>
>
> Do you mean the result is incorrect or just fp overflow flag? I think
> Clang doesn’t promise the fp status always be correct by default. But you
> can use -ffp-model=strict or #pragma float_control(except, on) to force it
> to do so.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Raghuveer
> Devulapalli via llvm-dev
> *Sent:* Sunday, December 20, 2020 9:56 AM
> *To:* llvm-dev at lists.llvm.org
> *Subject:* [llvm-dev] bug in clang?
>
>
>
> Hi all,
>
>
>
> Here is a snippet of code taken from NumPy and the corresponding output
> when using clang 10.0 (or even clang 12.0). Can also view the code here:
> https://godbolt.org/z/Peznjx
>
>
>
> The compiler optimizes and removes the first blendv_ps instruction (i.e
> eliminates the temp1 variable all together). The goal of this instruction
> was to prevent an overflow flag generated in the following multiplication
> instruction. By eliminating this instruction, the resulting code causes an
> unintended fp overflow flag which I think is incorrect behavior.
>
>
>
> C code:
>
>
>
> __m256 fma_get_mantissa(__m256 x)
> {
> __m256 two_power_100 =
> _mm256_castsi256_ps(_mm256_set1_epi32(0x71800000));
> __m256 denormal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
> _CMP_LT_OQ);
> __m256 normal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
> _CMP_GE_OQ);
>
> __m256 temp1 = _mm256_blendv_ps(x, _mm256_set1_ps(0.0f), normal_mask);
> __m256 temp = _mm256_mul_ps(temp1, two_power_100);
> x = _mm256_blendv_ps(x, temp, denormal_mask);
>
> __m256i mantissa_bits = _mm256_set1_epi32(0x7fffff);
> __m256i exp_126_bits = _mm256_set1_epi32(126 << 23);
> return _mm256_castsi256_ps(
> _mm256_or_si256(
> _mm256_and_si256(
> _mm256_castps_si256(x), mantissa_bits),
> exp_126_bits));
>
> }
>
>
>
> object code output from clang:
>
>
>
> .LCPI0_0:
> .long 0x00800000 # float 1.17549435E-38
> .LCPI0_1:
> .long 0x71800000 # float 1.2676506E+30
> .LCPI0_2:
> .long 4294967170 # 0xffffff82
> .LCPI0_3:
> .long 0xc2c80000 # float -100
> fma_get_exponent: # @fma_get_exponent
> vbroadcastss ymm1, dword ptr [rip + .LCPI0_0] # ymm1 =
> [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
> vcmpltps ymm2, ymm0, ymm1
> vbroadcastss ymm3, dword ptr [rip + .LCPI0_1] # ymm3 =
> [1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30]
> *vmulps ymm3, ymm0, ymm3*
> vcmpnleps ymm1, ymm1, ymm0
> vandps ymm1, ymm1, ymm3
> vblendvps ymm0, ymm0, ymm1, ymm2
> vpsrld ymm0, ymm0, 23
> vpbroadcastd ymm1, dword ptr [rip + .LCPI0_2] # ymm1 =
> [4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170]
> vpaddd ymm0, ymm0, ymm1
> vcvtdq2ps ymm0, ymm0
> vbroadcastss ymm1, dword ptr [rip + .LCPI0_3] # ymm1 =
> [-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2]
> vaddps ymm1, ymm0, ymm1
> vblendvps ymm0, ymm0, ymm1, ymm2
> ret
>
>
>
> Raghuveer
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201221/7c1adf5f/attachment-0001.html>
More information about the llvm-dev
mailing list