[cfe-dev] fp-contract at -O0
Kaylor, Andrew via cfe-dev
cfe-dev at lists.llvm.org
Thu Feb 13 18:17:43 PST 2020
> Why not? What situation are you trying to avoid?
It just seems unexpected. I write code with no explicit FMA’s. I compile with no command line options, and I get FMA. It’s not what I’d expect, and someone else specifically complained about this behavior after Melanie’s patch landed.
> I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?
Yes, that is my concern. I think =fast should always produce at least as many FMA’s as =on.
> If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.
That’s an excellent point. I could definitely be persuaded by that argument.
-Andy
From: scanon at apple.com <scanon at apple.com>
Sent: Thursday, February 13, 2020 5:37 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] fp-contract at -O0
Why not? What situation are you trying to avoid?
I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?
If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.
On Feb 13, 2020, at 8:32 PM, Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote:
That’s certainly a reasonable position, but it isn’t without problems. For instance: https://godbolt.org/z/9JtoPt
In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.
I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.
-Andy
From: scanon at apple.com<mailto:scanon at apple.com> <scanon at apple.com<mailto:scanon at apple.com>>
Sent: Thursday, February 13, 2020 5:17 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>>
Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [cfe-dev] fp-contract at -O0
-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).
From my perspective, this is absolutely the expected behavior.
– Steve
On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
Hi everyone,
Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.
The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.
There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:
--------
test.c
--------
double f(double a, double b, double c) {
return a * b + c;
}
--------
clang -c -O0 -ffp-contract=on test.c
--------
Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.
What should we do with this? I see two possible solutions:
1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)
2. The front end should not form the llvm.fmuladd intrinsic at -O0
The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.
Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.
FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.
Input here would be appreciated.
Thanks,
Andy
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200214/156daa83/attachment-0001.html>
More information about the cfe-dev
mailing list