<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1898859613;
mso-list-type:hybrid;
mso-list-template-ids:2056140038 1362497412 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:Calibri;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><o:p> </o:p></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">To be clear, there are no target-specific intrinsics in this particular example because clang translates the source-level intrinsics to generic IR:<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1"><a href="https://godbolt.org/z/q4YYs6PxM">https://godbolt.org/z/q4YYs6PxM</a><o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">(That shows -O1 to make it easier to read, but there are no intrinsics at -O0 either.)<o:p></o:p></li></ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Yes, I had known that. My actual argument is that while the generic IR does indeed capture the basic intent of, for example, _<i>mm_add</i>_pd, that the intrinsics programmer very likely expects that optimizations happen as if _<i>mm_add</i>_pd
were a function call that actually performs the _<i>mm_add</i>_pd instruction. I can reasonably see an argument for why you might want some of the fast math flags to apply, but the reassoc flag seemed to me more questionable.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">So changing the header file to avoid a subset of those optimizations will likely cause perf regressions/complaints<o:p></o:p></li></ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">That certainly is possible. I’d guess that since -ffast-math is not the clang default, that in practice most intrinsic programmers would never choose to turn it on, and that therefore there would be no change in behavior of the vast majority
of codes.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">FWIW, I found this issue while debugging an problem noticed with a math library routine hat was being ported from using the Intel C compiler to try to compile and run it using clang. Also, FWIW, Microsoft seems to also not optimize (or
at least do this reassociation) across intrinsics.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Kevin Smith <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Sanjay Patel <spatel@rotateright.com> <br>
<b>Sent:</b> Wednesday, July 14, 2021 6:27 AM<br>
<b>To:</b> Wang, Pengfei <pengfei.wang@intel.com><br>
<b>Cc:</b> Smith, Kevin B <kevin.b.smith@intel.com>; llvm-dev@lists.llvm.org<br>
<b>Subject:</b> Re: [llvm-dev] [RFC] Should -ffast-math affect intrinsics?<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">To be clear, there are no target-specific intrinsics in this particular example because clang translates the source-level intrinsics to generic IR:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><a href="https://godbolt.org/z/q4YYs6PxM">https://godbolt.org/z/q4YYs6PxM</a><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">(That shows -O1 to make it easier to read, but there are no intrinsics at -O0 either.)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">We chose that form to give the IR and codegen optimizers full opportunity to perform generic transforms because the target-specific source ops have the same semantics as generic IR.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">So changing the header file to avoid a subset of those optimizations will likely cause perf regressions/complaints.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Jul 14, 2021 at 5:24 AM Wang, Pengfei via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">Hi Kevin,<br>
<br>
AFAIK, it is expected behavior that the fast-math flags affect llvm intrinsics. An example is llvm.vector.reduce.fadd.*
<a href="https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic" target="_blank">
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic</a>.<br>
But how fast-math flags affect target dependent intrinsics is a bit vague. Because target intrinsics are expressions of the inherent characteristic of native instructions. So they imply the special FP model sometimes. E.g.: on X86, we have some intrinsics that
assume to be used under fp-model=strict, e.g. _mm512_add_round_ps etc., while some assume to be used with given constraint (similar to fast math flags), e.g. _mm_max_ps etc.<br>
In general, I think we should respect fast math flags on target intrinsics too. We don't do much of it simply because we don't put the emphasis on the performance of target intrinsics. There was an optimization under fast math flag in X86InstCombineIntrinsic.cpp,
which I removed in D85385 for other propose.<br>
<br>
Thanks<br>
Pengfei<br>
<br>
-----Original Message-----<br>
From: llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> On Behalf Of Smith, Kevin B via llvm-dev<br>
Sent: Tuesday, July 13, 2021 5:46 AM<br>
To: <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
Subject: Re: [llvm-dev] [RFC] Should -ffast-math affect intrinsics?<br>
<br>
Sorry, missed a NOT or two.<br>
<br>
This is what I meant to say:<br>
It seems to me that the fast-math flags really should NOT affect intrinsics implementations themselves, and that the fast-math flags should NOT allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has
noticed before?<br>
<br>
It surprised me.4<br>
<br>
I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least it doesn't allow reassociation across the call boundaries and I haven't checked the Microsoft compiler
yet.<br>
<br>
Kevin Smith<br>
<br>
-----Original Message-----<br>
From: llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> On Behalf Of Smith, Kevin B via llvm-dev<br>
Sent: Monday, July 12, 2021 2:28 PM<br>
To: <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
Subject: [llvm-dev] [RFC] Should -ffast-math affect intrinsics?<br>
<br>
I've got the following little program that illustrates what I think is a problem. This is for X86/Intel64 intrinsics.<br>
<br>
If compiled using<br>
$ clang -O2 intrin_prob.c<br>
$ a.out<br>
2.000000, 3.000000<br>
<br>
This is the expected result. But if compiled using $ clang -O2 -ffast-math intrin_prob.c $ a.out 1.500000, 3.255000<br>
<br>
This gets incorrect results, because reassociation happens across the calls to the _mm_add_pd, and _mm_sub_pd intrinsics and the value that should have been added and subtracted gets constant folded to zero. It seems to me that the fast-math flags really should
not affect intrinsics implementations themselves, and that the fast-math flags should allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me.<br>
I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least not for reassociation across the call boundaries and I haven't checked the Microsoft compiler yet.<br>
<br>
An easy "fix" would be to add<br>
#pragma float_control(precise, on)<br>
or<br>
#pragma clang fp reassociate(off)<br>
near the top of immintrin.h to cause all intrinsics to ignore all fast-math flags, or at least ignore reassociation.<br>
<br>
$ cat intrin_prob.c<br>
#include <immintrin.h><br>
#include <stdio.h><br>
<br>
static union {<br>
double u1[2];<br>
__m128d u2;<br>
} t1[1] = {1.25, 3.25};<br>
<br>
int main(int argc, char **argv) {<br>
__m128d t2;<br>
__m128d t3;<br>
// This is just so the compiler cannot constant fold<br>
// and know the values of t1.<br>
t1[0].u1[0] += argc * 0.25;<br>
t1[0].u1[1] += argc * .005;<br>
<br>
// This value when added, then subtracted should cause<br>
// the values to be truncated to integer. If the compiler<br>
// optimizes the add and subtract out by doing<br>
// reassociation, then the printed values will have<br>
// fractional parts. If the compiler does the intrinsics<br>
// as expected, then the values printed will have no fractional part.<br>
t2 = _mm_castsi128_pd(_mm_set_epi32((int)((0x4338000000000000uLL) >> 32),<br>
(int)((0x4338000000000000uLL) >> 0),<br>
(int)((0x4338000000000000uLL) >> 32),<br>
(int)((0x4338000000000000uLL) >> 0)));<br>
t3 = _mm_add_pd(t1[0].u2, t2);<br>
t3 = _mm_sub_pd(t3, t2);<br>
t1[0].u2 = t3;<br>
<br>
printf("%f, %f\n", t1[0].u1[0], t1[0].u1[1]);<br>
return 0;<br>
}<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</body>
</html>