[llvm-dev] bug in clang?

Mon Dec 21 10:27:06 PST 2020

Hi Pengfei,

I did not know that, thanks for the response.

Raghuveer

P.S: I only meant the fp overflow flag.

On Mon, Dec 21, 2020 at 6:45 AM Wang, Pengfei <pengfei.wang at intel.com>
wrote:

> Hi Raghuveer,
>
>
>
> Do you mean the result is incorrect or just fp overflow flag? I think
> Clang doesn’t promise the fp status always be correct by default. But you
> can use -ffp-model=strict or #pragma float_control(except, on) to force it
> to do so.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Raghuveer
> Devulapalli via llvm-dev
> *Sent:* Sunday, December 20, 2020 9:56 AM
> *To:* llvm-dev at lists.llvm.org
> *Subject:* [llvm-dev] bug in clang?
>
>
>
> Hi all,
>
>
>
> Here is a snippet of code taken from NumPy and the corresponding output
> when using clang 10.0 (or even clang 12.0). Can also view the code here:
> https://godbolt.org/z/Peznjx
>
>
>
> The compiler optimizes and removes the first blendv_ps instruction (i.e
> eliminates the temp1 variable all together). The goal of this instruction
> was to prevent an overflow flag generated in the following multiplication
> instruction. By eliminating this instruction, the resulting code causes an
> unintended fp overflow flag which I think is incorrect behavior.
>
>
>
> C code:
>
>
>
> __m256 fma_get_mantissa(__m256 x)
> {
>     __m256 two_power_100 =
> _mm256_castsi256_ps(_mm256_set1_epi32(0x71800000));
>     __m256 denormal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
> _CMP_LT_OQ);
>     __m256 normal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN),
> _CMP_GE_OQ);
>
>     __m256 temp1 = _mm256_blendv_ps(x, _mm256_set1_ps(0.0f), normal_mask);
>     __m256 temp = _mm256_mul_ps(temp1, two_power_100);
>     x = _mm256_blendv_ps(x, temp, denormal_mask);
>
>     __m256i mantissa_bits = _mm256_set1_epi32(0x7fffff);
>     __m256i exp_126_bits  = _mm256_set1_epi32(126 << 23);
>     return _mm256_castsi256_ps(
>                 _mm256_or_si256(
>                     _mm256_and_si256(
>                         _mm256_castps_si256(x), mantissa_bits),
> exp_126_bits));
>
> }
>
>
>
> object code output from clang:
>
>
>
> .LCPI0_0:
>         .long   0x00800000                      # float 1.17549435E-38
> .LCPI0_1:
>         .long   0x71800000                      # float 1.2676506E+30
> .LCPI0_2:
>         .long   4294967170                      # 0xffffff82
> .LCPI0_3:
>         .long   0xc2c80000                      # float -100
> fma_get_exponent:                       # @fma_get_exponent
>         vbroadcastss    ymm1, dword ptr [rip + .LCPI0_0] # ymm1 =
> [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
>         vcmpltps        ymm2, ymm0, ymm1
>         vbroadcastss    ymm3, dword ptr [rip + .LCPI0_1] # ymm3 =
> [1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30]
>         *vmulps  ymm3, ymm0, ymm3*
>         vcmpnleps       ymm1, ymm1, ymm0
>         vandps  ymm1, ymm1, ymm3
>         vblendvps       ymm0, ymm0, ymm1, ymm2
>         vpsrld  ymm0, ymm0, 23
>         vpbroadcastd    ymm1, dword ptr [rip + .LCPI0_2] # ymm1 =
> [4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170]
>         vpaddd  ymm0, ymm0, ymm1
>         vcvtdq2ps       ymm0, ymm0
>         vbroadcastss    ymm1, dword ptr [rip + .LCPI0_3] # ymm1 =
> [-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2]
>         vaddps  ymm1, ymm0, ymm1
>         vblendvps       ymm0, ymm0, ymm1, ymm2
>         ret
>
>
>
> Raghuveer
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201221/7c1adf5f/attachment-0001.html>