[llvm-dev] Floating point variance in the test suite

Wed Jun 23 14:36:51 PDT 2021

On Wed, Jun 23, 2021 at 1:57 PM Kaylor, Andrew via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi everyone,
>
>
>
> I’d like to restart an old discussion about how to handle floating point
> variance in the LLVM test suite. This was brought up years ago by Sebastian
> Pop (https://lists.llvm.org/pipermail/llvm-dev/2016-October/105730.html)
> and Sebastian did some work to fix things at that time, but I’ve discovered
> recently that a lot of things aren’t working the way they are supposed to,
> and some fundamental problems were never addressed.
>
>
>
> The basic issue is that at least some tests fail if the floating point
> calculations don’t exactly match the reference results. In 2016, Sebastian
> attempted to modify the Polybench tests to allow some tolerance, but that
> attempt was not entirely successful. Other tests don’t seem to have a way
> to handle this. I don’t know how many.
>
>
>
> Melanie Blower has been trying for some time now to commit a change that
> would make the fp-contract=on the default setting for clang (as the
> documentation says it is), but this change caused failures in the test
> suite. Melanie recently committed a change (
> https://reviews.llvm.org/rT24550c3385e8e3703ed364e1ce20b06de97bbeee)
> which overrides the default and sets fp-contract=off for the failing tests,
> but this is not a good long term solution, as it leaves fp contraction
> untested (though apparently it has been for quite some time).
>
>
>
> The test suite attempts to handle floating point tests in several
> different ways (depending on the test configuration):
>
>
>
> 1. Tests write floating point results to a text file and the fpcmp utility
> is used to compare them against reference output. The method allows
> absolute and relative tolerance to be specified, I think.
>
>
>
> 2. Tests write floating point results to a text file which is then hashed
> and compared against a reference hash value.
>
>
>
> 3. (Sebastian’s 2016 change) Tests are run with two kernels, one which is
> compiled with FMA explicitly disabled and one with the test suite
> configured options. The results of these kernels are compared with a
> specified tolerance, then the FMA-disabled results are written to a text
> file, hashed and compared to a reference output.
>
>
>
> I’ve discovered a few problems with this.
>
>
>
> First, many of the tests are producing hashed results but using fpcmp to
> compare the hash values. I created
> https://bugs.llvm.org/show_bug.cgi?id=50818 to track this problem. I
> don’t know the configuration well enough to fix it, but it seems like it
> should be a simple problem. If the hash values match, it works (for the
> wrong reasons). It doesn’t allow any FP tolerance, but that seems to be
> expected (hashing and FP tolerance cannot be combined in the configuration
> files). Personally, I don’t like the use of hashing with floating point
> results, so I’d like to get rid of method 2.
>

We've hit at least one failure when porting because the hashing means that
miniscule differences in standard library implementations are not tolerated.

>
>
> Second, when the third method is used FMA is being disabled using the STDC
> FP_CONTRACT pragma. Currently, LLVM does not respect this pragma when
> fp-contract=fast is used. This seems to be accepted behavior, but in my
> opinion it’s obviously wrong. A new setting was added recently for
> fp-contract=fast-honor-pragmas. I would very much like to see this work the
> other way -- by default fp-contract=fast should honor the pragmas, and if
> someone needs a setting that doesn’t that can be added. In any event, the
> relevant information here is that Sebastian’s FMA disabling solution
> doesn’t work for fp-contract=fast. Both kernels are compiled with FMA
> enabled, so their results match, but the test fails the hash comparison
> because the “FMA disabled” kernel really had FMA enabled.
>
>
>
> Third, when the third method is used, it’s checking the intermediate
> results using “FP_ABSTOLERANCE” and in some cases FMA exceeds the tolerance
> currently configured. For example, in the Polybench symm test I got this
> output:
>
>
>
> A[5][911] = 85644607039.746628 and B[5][911] = 85644607039.746643 differ
> more than FP_ABSTOLERANCE = 0.000010
>
>
>
> The difference there looks pretty reasonable to me, but because we’re
> looking for a minimal absolute difference, the test failed. Incidentally,
> the test failed with a message that said “fpcmp-target: FP Comparison
> failed, not a numeric difference between '0' and 'b'” because the above
> output got hashed and compared to a reference hash value using the tool
> that expected both hash values to be floating point values. The LNT bots
> don’t even give you this much information though, as far as I can tell,
> they just tell you the test failed. But I digress.
>
>
>
> Finally, a few of the Polybench tests are intending to use the third
> method above but aren’t actually calling the “strict” kernel so they fail
> with fp-contract=on.
>
>
>
> So, that’s what I know. There is an additional problem that has never been
> addressed regarding running the test suite with fast-math enabled.
>
>
>
> Now I suppose I should say what kind of feedback I’m looking for.
>
>
>
> I guess the first thing I want is to find out who is interested in the
> results and behavior of these tests. I assume there are people who care,
> but they get touched so infrequently and are in such a bad state that I
> don’t know if anyone is even paying attention beyond trying to keep the
> bots green. I don’t want to spend a lot of time cleaning up tests that
> aren’t useful to anyone just because they happen to be in the test suite.
>
>
>
> Beyond that, I’d like to get general feedback on the strategy that we
> should be adopting in the test suite. In the earlier discussion of this
> issue, the consensus seemed to be that the problems should be addressed on
> a case-by-case basis, doing the “right thing” for each test. This kind of
> implies a level of test ownership. I would like input for each test from
> someone who cares about that test.
>
>
>
> Finally, I’d like to hear some general opinions on whether the tests we
> have now are useful and sufficient for floating point behavior.
>
>
>
> Thanks,
>
> Andy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/3c7b5a6c/attachment.html>