[cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

Thu Sep 29 14:33:24 PDT 2016

On Thu, Sep 29, 2016 at 4:25 PM, Renato Golin via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> On 29 September 2016 at 18:59, Abe Skolnik <a.skolnik at samsung.com> wrote:
>> As part of working on making test-suite less demanding of exact FP results
>> so my FP-contraction patch can go back into trunk and stay there, today I
>> analyzed "MultiSource/Benchmarks/VersaBench/beamformer".  I found that the
>> raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of
>> floating-point text, which IMO is too much to put into a patch -- or at
>> least a _civilized_ patch.  ;-)
>
> Not to mention having to debug the whole thing every time it breaks. :S
>
> How I "fixed" it in the past was to do some heuristics like you did,
> while still trying to keep the meaning.
>
> I think the idea of having a "number of results" is good, but I also
> think you can separate the 300k values in logical groups, maybe adding
> them up.
>
> Of course, the more you do to the results, the higher will be the
> rounding errors, and the less meaning the results will have.
>
> I don't know much about this specific benchmark, but if it has some
> kind of internal aggregation values, you can dump those instead?
>
>
>> As a result, I wrote the below Python program, which I think should deal
>> with the problem fairly well, or at least is a good first attempt at doing
>> so and can be improved later.
>
> The python script is a good prototype for what you want, but I'd
> rather change the source code of the benchmark to print less, more
> valuable, stuff.
>
> The more you print, the more your run will be tied to stdio and less
> to what you wanted to benchmark in the fist place.
>
>
>> Vanilla compiler, i.e. without FP-contraction patch
>> ---------------------------------------------------
>> 286720
>> 9178782.5878
>>
>> Compiler WITH FP-contraction patch
>> ----------------------------------
>> 286720
>> 9178782.58444
>
> This looks like a small enough change to me, given the amount of
> precision you're losing. But it'd be better to make sure the result
> has at least some meaning related to the benchmark.

Summing up errors cancels them up: it makes no sense to do this.
You may be missing errors that are an order of magnitude off both directions
and still end up with an overall small delta after reduction.
And by the way, what does it mean "small enough change"?
Based on what metrics?
Is there a way to quantify what is an acceptable error margin?
There is a lot of information that is thrown away with the sum reduction.

Sebastian