[cfe-dev] fp-contract at -O0

Thu Feb 13 16:30:06 PST 2020

Hi everyone,

Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract "on" instead of "off". While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.

The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.

There are a couple of things that need to be sorted out here, but I'd like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:

--------
test.c
--------
double f(double a, double b, double c) {
  return a * b + c;
}
--------
clang -c -O0 -ffp-contract=on test.c
--------

Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the "fast" flag to math operations (which also allows contraction), but will not lead to FMA formation.

What should we do with this? I see two possible solutions:

1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)
2. The front end should not form the llvm.fmuladd intrinsic at -O0

The second option seems preferable to me, but I don't know how unnatural it might be for the front end to respond to optimization level.

Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven't heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don't know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I'm inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn't see it as a reason to prefer to disable FP contraction.

FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.

Input here would be appreciated.

Thanks,
Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200214/952dfa46/attachment.html>