[cfe-dev] fp-contract at -O0

Stephen Canon via cfe-dev cfe-dev at lists.llvm.org
Thu Feb 13 17:37:05 PST 2020


Why not? What situation are you trying to avoid?

I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?

If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.

> On Feb 13, 2020, at 8:32 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:
> 
> That’s certainly a reasonable position, but it isn’t without problems. For instance: https://godbolt.org/z/9JtoPt <https://godbolt.org/z/9JtoPt>
>  
> In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.
>  
> I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.
>  
> -Andy
>  
> From: scanon at apple.com <mailto:scanon at apple.com> <scanon at apple.com <mailto:scanon at apple.com>> 
> Sent: Thursday, February 13, 2020 5:17 PM
> To: Kaylor, Andrew <andrew.kaylor at intel.com <mailto:andrew.kaylor at intel.com>>
> Cc: cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> Subject: Re: [cfe-dev] fp-contract at -O0
>  
> -O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).
>  
> From my perspective, this is absolutely the expected behavior.
>  
> – Steve
> 
> 
> On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>  
> Hi everyone,
>  
> Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.
>  
> The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.
>  
> There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:
>  
> --------
> test.c
> --------
> double f(double a, double b, double c) {
>   return a * b + c;
> }
> --------
> clang -c -O0 -ffp-contract=on test.c
> --------
>  
> Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.
>  
> What should we do with this? I see two possible solutions:
>  
> 1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)
> 2. The front end should not form the llvm.fmuladd intrinsic at -O0
>  
> The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.
>  
> Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.
>  
> FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.
>  
> Input here would be appreciated.
>  
> Thanks,
> Andy
>  
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200213/58bc443d/attachment-0001.html>


More information about the cfe-dev mailing list