[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Wed Oct 12 01:33:24 PDT 2016

On 12 October 2016 at 04:20, Hal Finkel <hfinkel at anl.gov> wrote:
> Obviously a lot of people have done an awful lot of thinking about this over many years, and contractions-by-default is the reality on many systems. If you have a program that is numerically unstable, simulating a chaotic system, etc. then any difference, often no matter how small, will lead to large-scale differences in the output. As a result, there will be some tests that don't have a useful tolerance; sometimes these are badly-implemented tests, but sometimes the sensitivity represents an underling physical reality of a simulated system (there's a lot of very-interesting mathematical theory behind this, e.g. https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).

Hi Hal,

I think we're crossing the wires, here.

There are three sources of uncertainties on chaotic systems:

1. Initial conditions, not affected by the compiler and "part of the
problem, part of the solution".
2. Evolution, affected by the compiler, not limited by FP-reordering
passes (UB can also play a role here).
3. Expectations, affected by the evolution and the nature of the
problem and too high level to be of any consequence to the compiler.

Initial conditions change in real life, but they must be the same in
tests. Same for evolution and expectation. You can't use an external
random number generator, you can't rely on different RNGs (that's why
I added hand-coded ones to some tests).

If the FP-contract pass affects (2), that's perfectly fine. But if if
affects (3), for example via changing the precision / errors / deltas,
then we have a problem.

>From what I understand, FP-contraction actually makes calculations
*more* precise, by removing one rounding operation every two. This
means to me that whatever tolerance of a well designed *test* must be
kept as low as possible.

And this is the key: if the tolerance of a test needs to be
*increased* because of FP-contract, then the test is wrong. Either the
code, or the reference output, or how we get to the reference values
is wrong. Disabling the checks, or increasing the tolerance beyond
what's meaningful in this case will make an irrelevant test useless.
Right now, it may be irrelevant and non-representative, but it can
catch compiler FP errors. Adding a huge bracket or disabling
FP-contract will remove even that small benefit.

Right now, the tests have one value, which happens to be identical in
virtually all platforms. This means the compiler is pretty good at
keeping the semantics and lucky in keeping the same precision. But we
both know this is wrong.

And now we have a chance to make those tests better. Not more accurate
per se, but more accurately testing the compiler. There is a big
difference here.

If we change the semantics of the code (mul -> sub), we're killing the
original test. If we increase the tolerance without analysis or
disable default passes, we're killing any chance to spot compiler
problems in the future.

> From a user-experience perspective, this can be very unfortunate. It can be hard to understand why compiler optimizations, or different compilers, produce executables that produce different outputs for identical input configurations. It contributes to feelings that floating point is hard and confusing.

On the contrary, it serves an an education that FP is a finite
representation of real numbers, and people shouldn't be expecting to
get byte-exact values anyway. I have strong reservations against
scientific code that doesn't take into account rounding issues, error
calculations, and that takes the results at face value. It's like
running one single Monte Carlo simulation and taking international
politics decisions based on that result.

> However, not using the contractions also leads to equally-confusing performance discrepancies between our compiler and others (and between the observed and expected performance).

Let's not mix conformance and performance. Different compilers have
different flags and behave differently. Enabling FP-contract in LLVM
has *nothing* to do with what GCC does, but to do with "what's a
better strategy for LLVM". We have refrained from following GCC
blindly for a number of years and it would be really sad if we started
now.

If FP-contract=on is a good decision for LLVM, on merits of precision,
performance and overall quality, then let's do it. If not, then let's
put it under some flag and tell all people comparing with GCC to use
that flag.

But if we do go with it, we need to make sure our current tests don't
just break apart and get hidden under a corner. GCC compatibility
isn't *that* important.

I'm not advocating against turning it on, I'm advocating against the
easy path of hiding the tests. We might just as well remove them.

I'll reply to Sebastian on a more practical way, but I wanted to make
it clear that we're talking about the test and not the transformation
itself, which needs to be analysed on its own merits, not on what GCC
does.

cheers,
--renato