[PATCH] D25346: [test-suite] [Polybench] run tests twice with -ffp-contract=on/off

Fri Oct 7 05:17:49 PDT 2016

sebpop added a comment.

In https://reviews.llvm.org/D25346#564313, @rengolin wrote:

> In https://reviews.llvm.org/D25346#564284, @sebpop wrote:
>
> > As I was mentioning, this is intended only for the Polybench, that are following a pretty regular pattern of testing loop kernels.
> >  The patch makes 2mm run correctly for CFLAGS="", "-Ofast", and "-O3 -ffp-contract=on".
> >  I have not looked at the other failing benchmarks to say whether this can apply to those.
> >  I know that oggenc would be impossible to modify like this.
>
>
> Right. Abe was trying to change the Make/CMake files, which could be a one-size-fits-all solution. If that works, I think we should go with that.
>
> > There are also these resource problems I was mentioning:
> > 
> > - compilation time will double: e.g., Polly will optimize both kernels,
> > - compute time on the device will more than double: running the kernel twice, plus an extra loop over both outputs to compare with FP_TOLERANCE.
>
> Those two are true for all solutions we can come up with. 2mm takes 100s on ARM and 3mm takes 150s. Doubling ~50 could make it several minutes longer.
>
> You were proposing getting something running first, then change later, but with a tolerance of 0.0001 after all aggregations, I think we could have FP=on and that tolerance in fpcmp and just run the default as a first approach, then later expand the FP=off compare.
>
> Of course, that only works if the tolerance is low enough on all 50.

I think that the tolerance may be even smaller for the way this patch checks for the outputs in Polybench.
FP_tolerance should be selected (computed as suggested by Stephen) on a per benchmark basis.

> 
> 
>> - memory requirements on the device will almost double: added one extra output array, input arrays are not modified, so no need to duplicate them,
> 
> Abe's solution wouldn't incur in additional memory consumption, but it would take longer to prepare/compile/run two completely different models of the same benchmark.

Also don't forget Abe's solution has two negative points addressed by this patch:

The good things about this patch:

- no modifications to CMake and Makefiles (Matthias was complaining about the added complexity to the build system)
- no extra space to store the extra reference output (1GB extra space and transfer over the network in the case of separate test devices, i.e., ARM)

https://reviews.llvm.org/D25346