[PATCH] D25277: [test-suite] Run FP tests twice with -ffp-contract=off/on

Wed Oct 5 11:48:18 PDT 2016

[Matthias Braun wrote:]

> Ugh, I don't like the added complexity and disk space bloat.

Unfortunately, I think we are up against _essential_ complexity at least a little: at least for 
tests for which the contraction/fast-math build differs from the conservatively-FP-optimized 
build, we need to do _two_ tests each.  What we compare the output of the _first_ against is 
another question [next quote-and-reply section].

In principle, the test harness could compare the two builds [aggressively-FP-optimized vs. 
conservatively-FP-optimized] and if the executables are binary-equal, then not bother running 
the second test.  However, TTBOMK that is still room for future improvement.

> Is there no way to summarize/hash/compress a list of floating point numbers in a way that we can compare
>  them against a reference output and at the same time allow some fluctuations in floatingpoint 
accuracy?

Yes, there is a way -- compressing the ref. outputs in the repo., decompressing at the test 
site -- but it has been viewed negatively because:

   * in some/many cases, this will significantly increase filesystem-storage space consumed at 
the test site

   * in some/many cases, this will significantly increase the size of the ref. output
     in the repo. [b/c the repo. currently has only a hash for those cases]

   * in some/many cases, this will make the ref. output in the repo. no longer human-readable

   * the probability of a hash collision causing a test to falsely pass is very very small

The above having been written, the compression idea was mine to start with AFAIK, and I`m still 
willing to implement it if the community arrives at consensus/majority-vote that this is the 
best idea, but I think I now agree with Hal that modern hashes are safe enough that we should 
not worry too much about it.  IOW, while compressed ref. outputs are "perfect", hashes are 
"good enough" _and_ human-readable _and_ small in the repo.

We can drastically reduce the probability of a hash collision by using a much-more-recent hash 
algo. than good old MD5 [my preference is for SHA-512]; however, this would add another tool 
requirement to the set of base requirements for running the tests.  I _could_ write a new 
script that does at least more-or-less what the current "HashProgramOutputs" does for MD5, i.e. 
to make the dependency more portable [for the reader who finds that surprising: the short story 
is that the program is usually called "md5" on BSD-based OSes and "md5sum" on GNU/Linux-based 
ones], but at this time I don`t know of any urgent need to move to a more collision-secure hash.

One advantage of {compressed ref. outputs} over {hashes as references} are that when a test 
fails, with a compressed ref. output one can manually decompress if needed, then compare the 
decompressed ref. output against the observed output and see what`s different.  With hashes, by 
necessity you only get "comparison failed".

Regards,

Abe