[cfe-dev] [llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Wed Oct 12 11:29:12 PDT 2016

> On Oct 12, 2016, at 7:05 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> ----- Original Message -----
>> From: "Renato Golin" <renato.golin at linaro.org <mailto:renato.golin at linaro.org>>
>> To: "Sebastian Pop" <sebpop.llvm at gmail.com <mailto:sebpop.llvm at gmail.com>>
>> Cc: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>, "Sebastian Paul Pop" <s.pop at samsung.com <mailto:s.pop at samsung.com>>, "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>,
>> "Matthias Braun" <matze at braunis.de <mailto:matze at braunis.de>>, "Clang Dev" <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>, "nd" <nd at arm.com <mailto:nd at arm.com>>, "Abe Skolnik"
>> <a.skolnik at samsung.com <mailto:a.skolnik at samsung.com>>
>> Sent: Wednesday, October 12, 2016 8:35:16 AM
>> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
>> 
>> On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm at gmail.com <mailto:sebpop.llvm at gmail.com>>
>> wrote:
>>> Correct me if I misunderstood: you would be ok changing the
>>> reference output to exactly match the output of "-O0
>>> -ffp-contract=off".
>> 
>> No, that's not at all what I said.
>> 
>> Matching identical outputs to FP tests makes no sense because there's
>> *always* an error bar.
> 
> This is something we need to understand. No, there's not always an error bar. With FMA formation and without non-IEEE-compliant optimizations (i.e. fast-math), the optimized answer should be identical to the non-optimized answer.

Can you clarify: in my mind the F in FMA is for “fused”, i.e. no intermediate truncation, i.e. not the same numerical result. But you imply the opposite above?

— 
Mehdi

> If these don't match, then we should understand why. This used to be a large problem because of fp80-related issues on x86 processors, but even on x86 if we stick to SSE (etc.) FP instructions, this is not an issue any more. We still do see cross-system discrepancies sometimes because of differences in denormal handling, but on the same system that should be consistent (aside, perhaps, from compiler-level constant-folding issues).
> 
> -Hal
> 
>> 
>> The output of O0, O1, O2, O3, Ofast, Os, Oz should all be within the
>> boundaries of an average and its associated error bar.
>> 
>> By understanding what's the *expected* output and its associated
>> error
>> range we can accurately predict what will be the correct
>> reference_output and the tolerance for each individual test.
>> 
>> Your solution 2 "works" because you're doing the matching yourself,
>> in
>> the code, and for that, you pay the penalty of running it twice. But
>> it's not easy to control the tolerance, nor it's stable for all
>> platforms where we don't yet run the test suite.
>> 
>> My original proposal, and what I'm still proposing here, is to
>> understand the tests and make them right, by giving them proper
>> references and tolerances. If the output is too large, reduce/sample
>> in a way that doesn't increase the error ranges too much, enough to
>> keep the tolerance low, so we can still catch bugs in the FP
>> transformations.
>> 
>> cheers,
>> --renato
>> 
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161012/4176203c/attachment.html>