<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">> Why not? What situation are you trying to avoid?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">It just seems unexpected. I write code with no explicit FMA’s. I compile with no command line options, and I get FMA. It’s not what I’d expect, and someone else specifically complained about this behavior after Melanie’s patch landed.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">> I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Yes, that is my concern. I think =fast should always produce at least as many FMA’s as =on.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">> If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">That’s an excellent point. I could definitely be persuaded by that argument.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">-Andy<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> scanon@apple.com <scanon@apple.com> <br>
<b>Sent:</b> Thursday, February 13, 2020 5:37 PM<br>
<b>To:</b> Kaylor, Andrew <andrew.kaylor@intel.com><br>
<b>Cc:</b> cfe-dev@lists.llvm.org<br>
<b>Subject:</b> Re: [cfe-dev] fp-contract at -O0<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Why not? What situation are you trying to avoid?<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.<o:p></o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Feb 13, 2020, at 8:32 PM, Kaylor, Andrew <<a href="mailto:andrew.kaylor@intel.com">andrew.kaylor@intel.com</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">That’s certainly a reasonable position, but it isn’t without problems. For instance:<span class="apple-converted-space"> </span><a href="https://godbolt.org/z/9JtoPt"><span style="color:purple">https://godbolt.org/z/9JtoPt</span></a><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">-Andy<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<div>
<p class="MsoNormal"><b>From:</b><span class="apple-converted-space"> </span><a href="mailto:scanon@apple.com"><span style="color:purple">scanon@apple.com</span></a><span class="apple-converted-space"> </span><<a href="mailto:scanon@apple.com"><span style="color:purple">scanon@apple.com</span></a>><span class="apple-converted-space"> </span><br>
<b>Sent:</b><span class="apple-converted-space"> </span>Thursday, February 13, 2020 5:17 PM<br>
<b>To:</b><span class="apple-converted-space"> </span>Kaylor, Andrew <<a href="mailto:andrew.kaylor@intel.com"><span style="color:purple">andrew.kaylor@intel.com</span></a>><br>
<b>Cc:</b><span class="apple-converted-space"> </span><a href="mailto:cfe-dev@lists.llvm.org"><span style="color:purple">cfe-dev@lists.llvm.org</span></a><br>
<b>Subject:</b><span class="apple-converted-space"> </span>Re: [cfe-dev] fp-contract at -O0<o:p></o:p></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">From my perspective, this is absolutely the expected behavior.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">– Steve<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal">On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org"><span style="color:purple">cfe-dev@lists.llvm.org</span></a>> wrote:<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal">Hi everyone,<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was
to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising
number of problems.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">--------<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">test.c<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">--------<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">double f(double a, double b, double c) {<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> return a * b + c;<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">}<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">--------<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">clang -c -O0 -ffp-contract=on test.c<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">--------<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other
hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">What should we do with this? I see two possible solutions:<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">2. The front end should not form the llvm.fmuladd intrinsic at -O0<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other
issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it
seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t
see it as a reason to prefer to disable FP contraction.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Input here would be appreciated.<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Andy<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica",sans-serif">_______________________________________________<br>
cfe-dev mailing list<br>
</span><a href="mailto:cfe-dev@lists.llvm.org"><span style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:#954F72">cfe-dev@lists.llvm.org</span></a><span style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
</span><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"><span style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:#954F72">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</span></a><o:p></o:p></p>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</body>
</html>