<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Hi all,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">In a mailing-list post last November:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">I raised some concerns that having the IR-level fast-math-flag 'fast' act as an<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">"umbrella" to implicitly turn on all the lower-level fast-math-flags, causes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">some fundamental problems. Those fundamental problems are related to<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">situations where a user wants to disable a portion of the fast-math behavior.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">For example, to enable all the fast-math transformations except for the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">reciprocal-math transformation, a command like the following is what a user<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">would expect to work:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">But that isn't what it's doing.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">I believe this is a serious problem, but I also want to avoid over-stating the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">seriousness. To be explicit, the problems I'm describing here happen when<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-ffast-math' is used with one or more of the underlying fast-math-related<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">aspects _disabled_ (like the '-fno-reciprocal-math' example, above).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Conversely, when '-ffast-math' is used "on its own", the situation is fine.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">For terminology here, I'll refer to these underlying fast-math-related aspects<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(like reciprocal-math, associative-math, math-errno, and others) as<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">"sub-fast-math" aspects.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">I apologize for the length of this post. I'm putting the summary up front, so<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">that anyone interested in fast-math issues can quickly get the big-picture of<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the issues I'm describing here.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">In Summary:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">1. With the change of r297837, the driver now more cleanly handles<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> '-ffast-math', and other sub-fast-math switches (like<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> '-f[no]-reciprocal-math', '-f[no-]math-errno', and others).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">2. Prior to that change, the disabling of a sub-fast-math switch was often<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> ineffective. So as an example, the following two commands often resulted<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> in the same code-gen, even if there were<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> fast-math-reciprocal-transformations that were done:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">3. Since that change, the disabling of a sub-fast-math switch disables many<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> more sub-fast-math transformations than just the one specified. So now,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> the following two commands often result in very similar (and sometimes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> identical) code-gen:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> That is, disabling a single sub-fast-math transformation in some (many?)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> cases now ends up disabling almost all the fast-math transformations.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> This causes a performance hit for people that have been doing this.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">4. To fix this, I think that additional fast-math-flags are likely needed in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> the IR. Instead of the following set:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> something like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> would be more useful. Related to this, the current 'fast' flag which acts<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract') may<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> not be needed. A discussion on this point was raised last November on the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> mailing list:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">TL;DR<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">More details are in that thread from November, but the problem in its entirety<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">involved both back-end LLVM issues, and front-end Clang (driver) issues. The<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">LLVM issues are related to the umbrella aspect of 'fast', along with other<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast-math-flags implementation details (described below). The front-end<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">aspects in Clang are related to the driver's handling of '-ffast-math' (which<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">also had an "umbrella" aspect). The driver code has been refactored since that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">November post, fixing the umbrella aspect of the front-end. But I never got<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">around to working on the related back-end issues (nor has anyone else), and the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">refactored front-end now results in the back-end issues manifesting<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">differently, and arguably in a worse way (details on the "worse" aspect,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">below).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">For reference, the refactored driver code was done in r297837:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> [Driver] Restructure handling of -ffast-math and similar options<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">To be clear, I'm not at all suggesting that the above change was incorrect. I<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">think that refactoring of the driver code is the right thing to do. An aspect<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">of this refactoring is that prior to it, when a user passed '-ffast-math' on<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the command-line, it was also passed to the cc1 process, even if a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">sub-fast-math component was disabled. With the refactoring, the driver only<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">passes '-ffast-math' to cc1 when a specific set of sub-fast-math components are<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">enabled.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">More specifically, when a user specifies just '-ffast-math' on the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">command-line, the following 7 sub-fast-math switches:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fno-honor-infinities<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fno-honor-nans<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fno-math-errno<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fassociative-math<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -freciprocal-math<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fno-signed-zeros<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> -fno-trapping-math<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">get passed to cc1 (this is true both with the old (pre r297837) and new (since<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">r297837) compilers). Furthermore, the "umbrella" '-ffast-math' is also passed<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">to cc1 in this case of the user specifying just '-ffast-math' on the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">command-line (again, in both the old and new compilers).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">The difference related to this issue in the old/new behavior, is that when a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">user turns on fast-math but disables one (or more) of the sub-fast-math<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">switches, for example, as in:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">then in the old mode '-ffast-math' was still passed to cc1 (acting as an<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">umbrella, causing trouble), but in the new mode '-ffast-math' is no longer<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">passed to cc1 in this case. (In both the old and new modes,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-freciprocal-math' is not passed to cc1 with this command-line, as you'd<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">expect.)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">What's happening is that in the old mode, it was the user passing '-ffast-math'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">on the command-line that resulted in passing the umbrella '-ffast-math' to cc1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(even if all 7 of the sub-fast-math switches were disabled by the user).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Whereas in the new mode, the '-ffast-math' switch is passed to cc1 iff all 7 of<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the underlying sub-fast-math switches are enabled.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">I'd say that's an improvement in the handling of the switches, and also on the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">plus side, I think it makes dealing with the concerns I raised in November LLVM<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">a little clearer, and so more manageable in some sense. But on the negative<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">side, since the new behavior in LLVM is arguably worse, fixing the back-end<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">issues is now a higher priority for my customers.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">The behavior that is arguably worse, is that when a user enables fast-math, but<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">attempts to disable one of the sub-fast-math aspects, the old behavior (pre<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">r297837) was that the sub-fast-math aspect to be disabled, generally (often?)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">remained enabled. The new behavior (since r297837) is that when disabling a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">sub-fast-math aspect, that aspect plus many more (possibly often the majority)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">of the fast-math transformations are disabled. So this results in a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">performance regression in these fast-math contexts when a sub-fast-math aspect<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">is disabled, which is why it is a fairly high priority for us.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">FTR, r297837 was made during llvm 5.0 development, so the new behavior has the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">effect of a performance regression in moving from 4.0 to 5.0. In describing<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">things here, I'll compare llvm 4.0 with llvm 5.0 behavior. But more precisely,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">it's pre-r297837 with post-r297837 behavior.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Here is a tiny example, to illustrate it concretely:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ cat assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">//////////// "assoc.cpp" ////////////<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">float foo(float a, float x)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> return ((a + x) - x); // fastmath reassociation eliminates the arithmetic<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">/////////////////////////////////////<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">When -ffast-math is specified, the reassociation enabled by it allows us to<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">simply return the first argument (and that reassociation does happen with<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-ffast-math', with both the old and new compilers):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -c -O2 -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -c -O2 -ffast-math -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">FTR, GCC also does the reassociation transformation here when '-ffast-math' is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">used, as expected.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">But when using '-ffast-math' and disabling a sub-fast-math aspect of it (say<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">via '-fno-reciprocal-math', '-fno-associative-math', or '-fmath-errno'), both<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the old and new compilers exhibit incorrect behavior in some cases. With the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">old compiler, the behavior was that using any of these switches did not disable<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the transformation. Those switches were mostly ineffective. (Only<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-fno-associative-math' should disable the transformation in this example, so<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the fact that the other ones didn't disable it is correct/desired.) Here is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the old behavior for the above test-case, when some example sub-fast-math<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">aspects are individually disabled:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang --version | grep version<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">clang version 4.0.0 (tags/RELEASE_400/final)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang -c -O2 -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang -c -O2 -ffast-math -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang -c -O2 -ffast-math -fno-reciprocal-math -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang -c -O2 -ffast-math -fno-associative-math -o x.o assoc.cpp # Error<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ old/bin/clang -c -O2 -ffast-math -fmath-errno -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">So with the old compiler, the case marked 'Error' above is incorrect, in that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the reassociation should be suppressed in that case, but it isn't.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Again FTR, the GCC behavior disables the re-association in the case marked<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'Error' above.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Moving on to the new compiler, instead of '-fno-associative-math' being<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">ineffective, the problem is that when disabling other sub-fast-math aspects<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(unrelated to reassociation), the transformation is suppressed, when it should<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">not be. Here is the new behavior with that same set of sub-fast-math aspects<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">individually disabled:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang --version | grep version<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">clang version 5.0.0 (tags/RELEASE_500/final)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang -c -O2 -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang -c -O2 -ffast-math -o x.o assoc.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang -c -O2 -ffast-math -fno-reciprocal-math -o x.o assoc.cpp # Error<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang -c -O2 -ffast-math -fno-associative-math -o x.o assoc.cpp # Good<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ new/bin/clang -c -O2 -ffast-math -fmath-errno -o x.o assoc.cpp # Error<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ llvm-objdump -d x.o | grep "^ .*: "<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 0: f3 0f 58 c1 addss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4: f3 0f 5c c1 subss %xmm1, %xmm0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 8: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">The two cases marked as 'Error' are incorrectly suppressing the re-association.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">The case marked as 'Good' is now doing the right thing for this test-case.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Again FTR, the GCC behavior allows the re-association in the cases marked<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'Error' above to happen.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">__________________________________________________________________<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Note that the '-f[no-]associative-math' flag has other problems, reported in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">PR27372 (https://bugs.llvm.org/show_bug.cgi?id=27372). Those "other problems"<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">are related to the fact that there isn't an LLVM IR fast-math-flag that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">explicitly indicates whether reassociation is enabled or disabled. As a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">consequence, the front-end essentially drops that flag on the floor. The<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">back-end has no way of explicitly looking for that capability, and so the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">back-end implementation instead relies on the "umbrella" aspect of 'fast'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">implicitly turning on all the lower-level fast-math-flags. This is a key<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">aspect of the problem. Near the start of this post, I mentioned that the LLVM<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">issues are related to the umbrella aspect of 'fast', along with other<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast-math-flag implementation details. The fact that the back-end has no way<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">of explicitly checking whether reassociation is enabled is what I meant by<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">those other implementation details.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Going to a more general discussion of the problem, the documentation of the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast-math-flags at:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> http://llvm.org/docs/LangRef.html#fast-math-flags<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">can be described loosely as:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">nnan Allow optimizations to assume the arguments and result are not NaN<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">ninf Allow optimizations to assume the arguments and result are not +/-Inf<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">nsz Allow optimizations to treat the sign of a zero argument or result<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> as insignificant<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">arcp Allow optimizations to use the reciprocal of an argument rather than<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> perform division<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">contract Allow floating-point contraction (e.g. fused multiply-and-add)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">And the flag 'fast' is defined there as:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast Fast - Allow algebraically equivalent transformations that may<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> dramatically change results in floating point (e.g. reassociate).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> This flag implies all the others.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(Side point: Back in November, 'contract' was not an explicit fast-math-flag.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">This is a recent change, but it doesn't impact the issue I'm raising here.)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">To summarize, and to relate this somewhat back to the November 2016 post:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">as described in that older post, this means that 'fast' could be described as:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> Very loosely, 'fast' means "all the aggressive FP-transformations that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> are not controlled by one of the other 5, plus it implies all the other<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 5". If for terminology, we call those additional aggressive<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> optimizations 'aggr', then we have:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'fast' == 'aggr' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">But there isn't a specific flag for 'aggr' (it's just "on" when all the other<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">flags are "on"). Reassociation is part of these additional 'aggr'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">transformations. Back in November, Hal pointed out that libm transformations<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">are another part of these 'aggr' transformations. With that, one possible<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">direction is to add two more sub-fast-math flags, say 'reassoc' and 'libm':<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">This would allow disabling (for example) 'arcp' without suppressing<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">reassociation. Whether there would be a need for an "umbrella" flag 'fast'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">that implies all the others is somewhat orthogonal, although personally I feel<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">it complicates the issue and doesn't provide any significant benefit. I can<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">imagine that there is a benefit that I haven't thought of -- I don't claim to<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">have a deep understanding of the implementation. So I'd like to hear what<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">others think.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">One important aspect of this is that it appears to me there are quite a few<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast-math transformations that are enabled only when all the underlying<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">sub-fast-math flags are on (that is, only when the 'fast' umbrella flag is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">set). That's a key part of the problem of PR27372. In this context, the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">change in behavior from r297837 is that with the old behavior, the following<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">two commands are almost equivalent (in many cases, they are equivalent):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ # Old behavior: The following two commands are nearly identical:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ clang -c -O2 -ffast-math foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ clang -c -O2 -ffast-math -fno-reciprocal-math foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Whereas with the new behavior (post-r297837), the following two commands are<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">almost always equivalent:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ # New behavior: The following two commands are nearly identical:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ clang -c -O2 foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $ clang -c -O2 -ffast-math -fno-reciprocal-math foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> $<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(Again, '-fno-reciprocal-math' is just an example of the suppression of a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">sub-fast-math aspect here. '-fno-associative-math and '-fmath-errno' would<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">also be good examples.)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Succinctly, if a '-ffast-math' user now disables a sub-fast-math aspect, they<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">will be frustrated that they end up disabling almost the entire set of<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">fast-math transformations. Whereas previously, they would be frustrated that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">their attempt of disabling a specific sub-fast-math aspect was ineffective. So<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">previously, they might try to "fix a numerical instability" by disabling a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">sub-fast-math aspect (and be frustrated by it not being effective), and now if<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">they try to "fix that numerical instability", they will succeed, but they will<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">see a performance-hit of losing nearly all the performance gain that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-ffast-math' was providing.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">As an aside, on the PS4 with llvm 4.0 (and earlier) compilers, we've had a few<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">customers frustrated that '-ffast-math -fno-reciprocal-math' was still doing<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">reciprocal transformations. So we've had a private change to make<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-fno-reciprocal-math' suppress the reciprocal optimization. With a vanilla<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">llvm 5.0, those customers would see a performance hit (so we have a different<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">private change to address that).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">As a final point here, to give more weight to this, I took a random bit of code<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">I found on github that that has floating-point fast-math opportunities in it,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">and experimented with it. (I just searched for 'mandelbrot', and took the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">first thing I found.) Specifically:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> https://gist.github.com/andrejbauer/7919569<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">This test-case has a few divisions in it, but it doesn't contain any<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">reciprocal-transformation opportunities (so '-f[no-]reciprocal-math' should<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">essentially be a no-op).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">The old Clang behavior has the following two commands being nearly identical<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">(they generate essentially equivalent code -- just some minor register<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">change):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # Old Clang behavior:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # No significant difference when –fno-reciprocal-math is added (as desired)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -ffast-math -o O2fm.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2fm.s O2fm.no_arcp.s | wc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 4 10 56<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">That is, as expected/desired, the '-fno-reciprocal-math' switch has essentially<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">no impact on this, since there are no reciprocal transformations being done.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Also as expected, the difference between "plain -O2" and '-O2 -ffast-math' is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">more substantial:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # Old Clang behavior:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # '-O2' vs '–O2 –ffast-math' shows a significant difference (as desired)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -o O2.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2.s O2fm.s | wc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 43 184 1305<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">That is, adding '-ffast-math' to '-O2' is transforming the code, presumably<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">making it faster (at the cost of a potential loss in numerical accuracy).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">With GCC for this example (I used version 4.8.4, which isn't particularly<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">modern, but I happen to have it handy), I get similar behavior. For example,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the following two commands produce identical assembly code:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ gcc -S -O2 -ffast-math -o O2fm.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ gcc -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2fm.s O2fm.no_arcp.s<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">and that code is substantially different than the GCC "plain -O2" code:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ gcc -S -O2 -o O2.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2.s O2fm.s | wc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 44 126 719<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">But comparing this to the new Clang behavior, we see that<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-fno-reciprocal-math' is mow "disabling too much", as discussed in detail<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">above for the simple "assoc.cpp" test-case. Specifically:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # New Clang behavior:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -o O2.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -ffast-math -o O2fm.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ clang -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s mandelbrot.c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # Adding -ffast-math to -O2 continues to show significant diffs (expected)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2.s O2fm.s | wc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 35 105 622<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # too many differences -- should be nearly the same<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2fm.s O2fm.no_arcp.s | wc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 29 89 526<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">So with the new behavior, even though there are no reciprocal transformation<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">opportunities, disabling that transformation via '-fno-reciprocal-math'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">disables many (most) of the fast-math features. In fact, comparing plain '-O2'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">with '-O2 -ffast-math -fno-reciprocal-math', it's clear that they are virtually<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">identical with the new Clang behavior. Specifically, we get only a minor<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">difference (of swapping of two register operands in a comparison, and changing<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the sense of the associated branch) when comparing '-O2' with<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-O2 -ffast-math -fno-reciprocal-math':<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # New Clang behavior:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ # nearly identical, but there should be many diffs<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$ diff O2.s O2fm.no_arcp.s<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">188,189c188,189<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">< ucomisd %xmm5, %xmm6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">< ja .LBB0_7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">---<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">> ucomisd %xmm6, %xmm5<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">> jb .LBB0_7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">$<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">In full disclosure, for this "mandelbrot.c" test-case, I don't know if any of<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">the changes in code-gen done by us or by GCC when '-ffast-math' is enabled are<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">helpful (from a performance perspective) or dangerous (from a precise IEEE FP<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">math perspective). All I know is that for both us and GCC at -O2, the switch<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">'-ffast-math' changed the code-gen, and that '-ffast-math -fno-reciprocal-math'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">didn't suppress any of those changes for GCC, but it suppressed essentially all<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">of the changes for us.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">For continuity, I'm repeating the summary here (that I had near the beginning).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">In Summary:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">1. With the change of r297837, the driver now more cleanly handles<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> '-ffast-math', and other sub-fast-math switches (like<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> '-f[no]-reciprocal-math', '-f[no-]math-errno', and others).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">2. Prior to that change, the disabling of a sub-fast-math switch was often<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> ineffective. So as an example, the following two commands often resulted<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> in the same code-gen, even if there were<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> fast-math-reciprocal-transformations that were done:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">3. Since that change, the disabling of a sub-fast-math switch disables many<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> more sub-fast-math transformations than just the one specified. So now,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> the following two commands often result in very similar (and sometimes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> identical) code-gen:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> That is, disabling a single sub-fast-math transformation in some (many?)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> cases now ends up disabling almost all the fast-math transformations.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> This causes a performance hit for people that have been doing this.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">4. To fix this, I think that additional fast-math-flags are likely needed in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> the IR. Instead of the following set:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> something like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> would be more useful. Related to this, the current 'fast' flag which acts<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract') may<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> not be needed. A discussion on this point was raised last November on the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> mailing list:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">-Warren<o:p></o:p></span></p>
</div>
</body>
</html>