<div dir="ltr">Hi Pengfei, <div><br></div><div>I did not know that, thanks for the response. </div><div><br></div><div>Raghuveer</div><div><br></div><div>P.S: I only meant the fp overflow flag.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Dec 21, 2020 at 6:45 AM Wang, Pengfei <<a href="mailto:pengfei.wang@intel.com">pengfei.wang@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_8798168523555890839WordSection1">
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Hi Raghuveer,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Do you mean the result is incorrect or just fp overflow flag? I think Clang doesn’t promise the fp status always be correct by default. But you can use -ffp-model=strict or #pragma float_control(except, on) to
force it to do so.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Thanks<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Pengfei<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> <b>On Behalf Of
</b>Raghuveer Devulapalli via llvm-dev<br>
<b>Sent:</b> Sunday, December 20, 2020 9:56 AM<br>
<b>To:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<b>Subject:</b> [llvm-dev] bug in clang?<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Hi all, <u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Here is a snippet of code taken from NumPy and the corresponding output when using clang 10.0 (or even clang 12.0). Can also view the code here: <a href="https://godbolt.org/z/Peznjx" target="_blank">https://godbolt.org/z/Peznjx</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The compiler optimizes and removes the first blendv_ps instruction (i.e eliminates the temp1 variable all together). The goal of this instruction was to prevent an overflow flag generated in the following multiplication instruction. By
eliminating this instruction, the resulting code causes an unintended fp overflow flag which I think is incorrect behavior. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">C code: <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">__m256 fma_get_mantissa(__m256 x)<br>
{<br>
__m256 two_power_100 = _mm256_castsi256_ps(_mm256_set1_epi32(0x71800000));<br>
__m256 denormal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN), _CMP_LT_OQ);<br>
__m256 normal_mask = _mm256_cmp_ps(x, _mm256_set1_ps(FLT_MIN), _CMP_GE_OQ);<br>
<br>
__m256 temp1 = _mm256_blendv_ps(x, _mm256_set1_ps(0.0f), normal_mask);<br>
__m256 temp = _mm256_mul_ps(temp1, two_power_100);<br>
x = _mm256_blendv_ps(x, temp, denormal_mask);<br>
<br>
__m256i mantissa_bits = _mm256_set1_epi32(0x7fffff);<br>
__m256i exp_126_bits = _mm256_set1_epi32(126 << 23);<br>
return _mm256_castsi256_ps(<br>
_mm256_or_si256(<br>
_mm256_and_si256(<br>
_mm256_castps_si256(x), mantissa_bits), exp_126_bits));<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">} <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">object code output from clang: <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">.LCPI0_0:<br>
.long 0x00800000 # float 1.17549435E-38<br>
.LCPI0_1:<br>
.long 0x71800000 # float 1.2676506E+30<br>
.LCPI0_2:<br>
.long 4294967170 # 0xffffff82<br>
.LCPI0_3:<br>
.long 0xc2c80000 # float -100<br>
fma_get_exponent: # @fma_get_exponent<br>
vbroadcastss ymm1, dword ptr [rip + .LCPI0_0] # ymm1 = [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]<br>
vcmpltps ymm2, ymm0, ymm1<br>
vbroadcastss ymm3, dword ptr [rip + .LCPI0_1] # ymm3 = [1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30,1.2676506E+30]<br>
<b>vmulps ymm3, ymm0, ymm3</b><br>
vcmpnleps ymm1, ymm1, ymm0<br>
vandps ymm1, ymm1, ymm3<br>
vblendvps ymm0, ymm0, ymm1, ymm2<br>
vpsrld ymm0, ymm0, 23<br>
vpbroadcastd ymm1, dword ptr [rip + .LCPI0_2] # ymm1 = [4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170,4294967170]<br>
vpaddd ymm0, ymm0, ymm1<br>
vcvtdq2ps ymm0, ymm0<br>
vbroadcastss ymm1, dword ptr [rip + .LCPI0_3] # ymm1 = [-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2,-1.0E+2]<br>
vaddps ymm1, ymm0, ymm1<br>
vblendvps ymm0, ymm0, ymm1, ymm2<br>
ret<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Raghuveer<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
</blockquote></div>