[cfe-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Tue Oct 11 20:20:56 PDT 2016

----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>,
> "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik"
> <a.skolnik at samsung.com>
> Sent: Tuesday, October 11, 2016 6:33:43 AM
> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
> 
> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at gmail.com>
> wrote:
> >>  1. Only test the non-FP-contracted output
> >
> > Yes, this is what I'm doing.
> 
> If the whole test is about testing multiplications, what's the point
> of this?
> 
> 
> >>  2. Run the FP-contracted test only for a very small size (so that
> >>  we'll stay within some reasonable tolerance of the reference
> >>  output)
> >>  3. Change the matrix to something that will make the test
> >>  numerically stable (it does not look like the matrix itself
> >>  matters to the performance; where do the values come from?).
> 
> 3 is more sound, 2 may be more practical.
> 
> 
> > -      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni;
> > -      B[i][j] = ((DATA_TYPE) i*j) / ni;
> > +      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni;
> > +      B[i][j] = ((DATA_TYPE) i-j) / ni;
> >      }
> >    for (i = 0; i < nj; i++)
> >      for (j = 0; j < nj; j++)
> > -      A[i][j] = ((DATA_TYPE) i*j) / ni;
> > +      A[i][j] = ((DATA_TYPE) i-j) / ni;
> 
> Changing from multiplication to subtraction changes completely the
> nature of the test and goes towards "return 0;", ie, fiddling with
> the
> code so that the compiler "behaves" better. This is *not* a solution.
> 
> Hal,
> 
> For large scale numerical programs, if fp-contract can result in
> large
> scale differences, we need to think about this approach by default.

Obviously a lot of people have done an awful lot of thinking about this over many years, and contractions-by-default is the reality on many systems. If you have a program that is numerically unstable, simulating a chaotic system, etc. then any difference, often no matter how small, will lead to large-scale differences in the output. As a result, there will be some tests that don't have a useful tolerance; sometimes these are badly-implemented tests, but sometimes the sensitivity represents an underling physical reality of a simulated system (there's a lot of very-interesting mathematical theory behind this, e.g. https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).

>From a user-experience perspective, this can be very unfortunate. It can be hard to understand why compiler optimizations, or different compilers, produce executables that produce different outputs for identical input configurations. It contributes to feelings that floating point is hard and confusing. However, not using the contractions also leads to equally-confusing performance discrepancies between our compiler and others (and between the observed and expected performance). We have a classic "Damned if you do, damned if you don't" situation. However, I lean toward enabling the contractions by default because other compilers do it (so users need to learn about what's going on anyway - we can't shield them from this regardless of what we do) and it gives users the performance they expect (which increases our user base and makes many users happier).

 -Hal

> 
> If the loop above cannot be contained in an 1e-8 range for double
> values over a large dataset, than I guess the transformation is going
> a bit too far.
> 
> If not, we should be able to come up with a reasonable tolerance that
> makes the test still be relevant.
> 
> cheers,
> --renato
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory