[cfe-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Tue Oct 11 04:15:42 PDT 2016

On Mon, Oct 10, 2016 at 5:02 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Sebastian Pop" <sebpop.llvm at gmail.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun"
>> <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>,
>> "Renato Golin" <renato.golin at linaro.org>
>> Sent: Monday, October 10, 2016 9:10:01 AM
>> Subject: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
>>
>> Hi,
>>
>> I would need some help to fix polybench/symm:
>>
>> void kernel_symm(int ni, int nj,
>> DATA_TYPE alpha,
>> DATA_TYPE beta,
>> DATA_TYPE POLYBENCH_2D(C,NI,NJ,ni,nj),
>> DATA_TYPE POLYBENCH_2D(A,NJ,NJ,nj,nj),
>> DATA_TYPE POLYBENCH_2D(B,NI,NJ,ni,nj))
>> {
>>   int i, j, k;
>>   DATA_TYPE acc;
>>
>>   /*  C := alpha*A*B + beta*C, A is symetric */
>>   for (i = 0; i < _PB_NI; i++)
>>     for (j = 0; j < _PB_NJ; j++)
>>       {
>>         acc = 0;
>>         for (k = 0; k < j - 1; k++)
>>           {
>>              C[k][j] += alpha * A[k][i] * B[i][j];
>>              acc += B[k][j] * A[k][i];
>>           }
>>         C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha
>>         * acc;
>>       }
>> }
>>
>> Compiling this kernel with __attribute__((optnone)) and outputing the
>> contents of the C[][] array does not match the reference output.
>
> Why is this? What compiler are you using? Are we not using IEEE FP @ -O0 (e.g. using x87 floating point)? IEEE FP, without FMA, should be completely deterministic. Sounds like a bug.

This is with clang top of tree, on a x86_64-linux.
I created https://reviews.llvm.org/D25465 with the changes that I have
to the symm benchmark.

>
>> Furthermore, compiling this kernel at -Ofast and comparing against
>> -O0
>> only passes for FP_ABSTOLERANCE=10.
>> All the 10 other polybench tests that I have transformed to check FP
>> are passing at FP_ABSTOLERANCE=1e-5 (and most likely they could pass
>> at an even more reduced tolerance.)
>>
>> The symm benchmark seems to accumulate all the errors as it is a big
>> reduction from the first elements of the C[][] array into the last
>> elements.
>> I'm not sure we can rely on this benchmark to check FP correctness.
>>
>> One option is to completely specify which optimization flags have
>> been
>> used to compute the reference output and only use that to compile
>> this
>> benchmark.
>>
>> Please share your ideas on how to deal with this particular test.
>
> If the test is not numerically stable, we can:
>
>  1. Only test the non-FP-contracted output

Yes, this is what I'm doing.

>  2. Run the FP-contracted test only for a very small size (so that we'll stay within some reasonable tolerance of the reference output)
>  3. Change the matrix to something that will make the test numerically stable (it does not look like the matrix itself matters to the performance; where do the values come from?).
>

The values may be very large towards the end of the C array.
The test now passes with FP_ABSTOLERANCE=1e-5 when lowering the values
in the input arrays with this patch:

diff --git a/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c
b/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c
index 0a1bdf3..7fc3cb1 100644
--- a/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c
+++ b/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c
@@ -35,12 +35,12 @@ void init_array(int ni, int nj,
   *beta = 2123;
   for (i = 0; i < ni; i++)
     for (j = 0; j < nj; j++) {
-      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni;
-      B[i][j] = ((DATA_TYPE) i*j) / ni;
+      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni;
+      B[i][j] = ((DATA_TYPE) i-j) / ni;
     }
   for (i = 0; i < nj; i++)
     for (j = 0; j < nj; j++)
-      A[i][j] = ((DATA_TYPE) i*j) / ni;
+      A[i][j] = ((DATA_TYPE) i-j) / ni;
 }

Of course we need to update the reference output hash if we decide to
use this patch.

Sebastian