<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Abe,
<div class=""><br class="">
</div>
<div class="">My 2 cents:</div>
<div class="">I have been using the test-suite mainly in benchmarking mode as a convenient way to track performance changes in top-of-trunk.</div>
<div class="">I've observed that some of the programs (IIRC, especially the ones in SingleSource/Benchmarks/Polybench/) produce a lot of output (megabytes).</div>
<div class="">This caused a lot of noise in performance measurements, as the execution time was dominated by printing out the data, rather than the actual useful computations. Renato removed the worst noise in <a href="http://reviews.llvm.org/D10991" class="">http://reviews.llvm.org/D10991</a>.</div>
<div class=""><br class="">
</div>
<div class="">That experience made me think that for the programs in the test-suite, ideally they should print out only a small amount of output to be checked.</div>
<div class="">For example, by adapting individual programs that output a lot of data to only print a summary/aggregate of the data, that somehow is likely to change</div>
<div class="">when a miscomputation happened.</div>
<div class=""><br class="">
</div>
<div class="">If we could go in that direction, I don't see much need for storing hashes or even compressed output as reference data.</div>
<div class="">I think that needing compressed reference data may make the test-suite ever so slightly harder to set up: another dependency on an external tool. Not that I can imagine that having a dependency on e.g. gzip would be problematic on any platform.</div>
<div class=""><br class="">
</div>
<div class="">Anyway, I thought I'd just share my opinion of it being ideal that the programs in the test-suite would only produce small outputs, to avoid noisy benchmark results. If that would be a direction we could go into, there may not be much needed for
storing hashes or compressed reference output.</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class=""><br class="">
</div>
<div class="">Kristof</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 6 Oct 2016, at 00:29, Abe Skolnik via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">Dear all,<br class="">
<br class="">
Today I had an idea that might satisfy all the needs for improvement we currently have "on the plate" WRT the repo.-wise sizes of reference outputs and the issues surrounding FP optimizations and how to allow them while still allowing test programs in "test-suite"
the output[s] of which depend upon FP computations [and for which relatively-small changes in FP accuracy, whether up/more-accurate or down/less-accurate, change the actual observed output].<br class="">
<br class="">
<br class="">
<br class="">
For non-FP-dependent, fully-deterministic programs, we can choose the shortest [in # of bytes as reported by "ls"] of the following:<br class="">
<br class="">
* hash<br class="">
* compressed output<br class="">
* raw output<br class="">
<br class="">
[in increasing order of "likely" size]<br class="">
<br class="">
... or we can establish some minimum differentiating factors, e.g. "compressed output must be at least 2x smaller than raw output, otherwise stick to raw output" and "hash must be at least 10x smaller than compressed output, otherwise stick to compressed output".
If needed/{strongly desired}, the rules can even be a little more complicated than that, e.g. "compressed output must be at least 2x smaller than raw output OR at least 4096 bytes smaller than raw output, otherwise stick to raw output".<br class="">
<br class="">
<br class="">
<br class="">
For programs that _are_ either FP-dependent, not-fully-deterministic, or both, I propose that we shall only choose from the set {compressed output, raw output} because:<br class="">
<br class="">
1) small-enough variation in the result is expected, normal, and tolerated<br class="">
<br class="">
and<br class="">
<br class="">
2) since this way the raw reference output will be available at the "lit"-running host [after decompression, if needed],<br class="">
the "fpcmp" program will be able to be told how much tolerance to allow for each run.<br class="">
<br class="">
If we only choose from the set {compressed ref. output, raw ref. output} for these tests, then it should be relatively easy to run some tests with output-changing FP optimizations enabled, since those runs won`t depend on the {no-output-changing-FP-optimizations}
build having run first. Although Hal`s suggestion to have the {no-output-changing-FP-optimizations} build produce the output that will be analyzed by the {output-changing FP optimizations enabled} builds is an excellent suggestion, it seems that implementing
it in the context of "lit" is a large amount more difficult than we had hoped for. If anybody reading this knows how to make "lit" only start one test after another one has finished, please chime in.<br class="">
<br class="">
<br class="">
If compressed ref. outputs will be accepted by the community, then please let me know which of the following would be acceptable to depend on the ability to decompress:<br class="">
<br class="">
bz2<br class="">
gzip<br class="">
xz<br class="">
<br class="">
I`m perfectly willing to write [a] wrapper[s] that will probe the system for programs that can decompress whatever it is and will choose the best one.<br class="">
<br class="">
<br class="">
Regards,<br class="">
<br class="">
Abe<br class="">
_______________________________________________<br class="">
LLVM Developers mailing list<br class="">
<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>