[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Wed Oct 12 07:00:44 PDT 2016

On Wed, Oct 12, 2016 at 9:35 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm at gmail.com> wrote:
>> Correct me if I misunderstood: you would be ok changing the
>> reference output to exactly match the output of "-O0 -ffp-contract=off".
>
> No, that's not at all what I said.

Thanks for clarifying your previous statement: I stand corrected.

>
> Matching identical outputs to FP tests makes no sense because there's
> *always* an error bar.

Agreed.

> The output of O0, O1, O2, O3, Ofast, Os, Oz should all be within the
> boundaries of an average and its associated error bar.

Agreed.

> By understanding what's the *expected* output and its associated error
> range we can accurately predict what will be the correct
> reference_output and the tolerance for each individual test.

Agreed.

>
> Your solution 2 "works" because you're doing the matching yourself, in
> the code, and for that, you pay the penalty of running it twice. But
> it's not easy to control the tolerance, nor it's stable for all
> platforms where we don't yet run the test suite.
>
> My original proposal, and what I'm still proposing here, is to
> understand the tests and make them right, by giving them proper
> references and tolerances. If the output is too large, reduce/sample
> in a way that doesn't increase the error ranges too much, enough to
> keep the tolerance low, so we can still catch bugs in the FP
> transformations.

This goes in the same direction as what you said earlier in:

> To simplify the analysis, you can reduce the output into a single
> number, say, adding all the results up. This will generate more
> inaccuracies than comparing each value, and if that's too large an
> error, then you reduce the number of samples.
>
> For example, on cholesky, we sampled every 16th item of the array:
>
>  for (i = 0; i < n; i++) {
>    for (j = 0; j < n; j++)
>      print_element(A[i][j], j*16, printmat);
>    fputs(printmat, stderr);
>  }

Wrt "we sampled every 16th item of the array", not really in that test,
but I get your point:

 k = 0;
 for (i = 0; i < n; i++) {
   for (j = 0; j < n; j+=16) {
     print_element(A[i][j], k, printmat);
     k += 16;
   }
   fputs(printmat, stderr);
 }

Ok, let's do this for the 5 benchmarks that do not exactly match.

Thanks,
Sebastian