[cfe-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Wed Oct 12 02:01:35 PDT 2016

On 12 October 2016 at 05:35, Sebastian Pop <sebpop.llvm at gmail.com> wrote:
> polybench/linear-algebra/solvers/gramschmidt/ exposes the same problems as symm.
> It does not match the reference output at -O0 -ffp-contract=off,
> and it only passes all elements comparisons for FP_ABSTOLERANCE=1 for
> "-Ofast" vs. "-O0 -ffp-contract=off".

I think we're going about this in a completely wrong way.

The current reference output is specific to fp-contract=off, and
making it work for fp-contract=on makes no sense at all.

For all we know, fp-contract=on generates *more accurate* results, not
less. But it may also have less predictable results *across* different
targets, thus the need to a tolerance.

FP_TOLERANCE is *not* about making the new results match an old
reference, but about showing the *real* uncertainties of FP
transformation on *different* targets.

So, if you want to fix this test for good, here are the steps you need to take:

1. Checkout the test-suite on different platforms, x86_64, ARM,
AArch64, PPC, MIPS. The more the merrier.
2. Enable fp-contract=on, run the tests on all platforms, record the
outputs, ignore the differences.
3. Collate each platofrm's output for each test and see how different they are

To make it easier to compare, in the past, I've used this trick:

1. Run in one platform, ex. x86_64, ignored the reference
2. Copy the output of those tests back to the reference_output
3. Run on a different platform, tweaking the tolerance until it "passes"
4. Run on yet another platform, making sure you don't need to tweak
the tolerance yet again

If the tolerance is "too high" for that test, we can further discuss
how to change it to make it better. If not, you found a solution.

If you want to make it even better, do some analysis on the
distribution of the results, per test, and pick the average as the
reference output and one or two standard deviations as the tolerance.
This should pass on most architectures.

To simplify the analysis, you can reduce the output into a single
number, say, adding all the results up. This will generate more
inaccuracies than comparing each value, and if that's too large an
error, then you reduce the number of samples.

For example, on cholesky, we sampled every 16th item of the array:

  for (i = 0; i < n; i++) {
    for (j = 0; j < n; j++)
      print_element(A[i][j], j*16, printmat);
    fputs(printmat, stderr);
  }

using "print_element" because calling printf sucks.

These modifications are ok, because they don't change the tests nor
hides them from compiler changes.

cheers,
--renato