[cfe-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Wed Oct 12 07:05:10 PDT 2016

----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>,
> "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik"
> <a.skolnik at samsung.com>
> Sent: Wednesday, October 12, 2016 8:35:16 AM
> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
> 
> On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm at gmail.com>
> wrote:
> > Correct me if I misunderstood: you would be ok changing the
> > reference output to exactly match the output of "-O0
> > -ffp-contract=off".
> 
> No, that's not at all what I said.
> 
> Matching identical outputs to FP tests makes no sense because there's
> *always* an error bar.

This is something we need to understand. No, there's not always an error bar. With FMA formation and without non-IEEE-compliant optimizations (i.e. fast-math), the optimized answer should be identical to the non-optimized answer. If these don't match, then we should understand why. This used to be a large problem because of fp80-related issues on x86 processors, but even on x86 if we stick to SSE (etc.) FP instructions, this is not an issue any more. We still do see cross-system discrepancies sometimes because of differences in denormal handling, but on the same system that should be consistent (aside, perhaps, from compiler-level constant-folding issues).

 -Hal

> 
> The output of O0, O1, O2, O3, Ofast, Os, Oz should all be within the
> boundaries of an average and its associated error bar.
> 
> By understanding what's the *expected* output and its associated
> error
> range we can accurately predict what will be the correct
> reference_output and the tolerance for each individual test.
> 
> Your solution 2 "works" because you're doing the matching yourself,
> in
> the code, and for that, you pay the penalty of running it twice. But
> it's not easy to control the tolerance, nor it's stable for all
> platforms where we don't yet run the test suite.
> 
> My original proposal, and what I'm still proposing here, is to
> understand the tests and make them right, by giving them proper
> references and tolerances. If the output is too large, reduce/sample
> in a way that doesn't increase the error ranges too much, enough to
> keep the tolerance low, so we can still catch bugs in the FP
> transformations.
> 
> cheers,
> --renato
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory