[PATCH] D112760: Require 'contract' fast-math flag for FMA generation

Mon Nov 1 15:28:09 PDT 2021

tra added inline comments.

================
Comment at: clang/test/CodeGenCUDA/fp-contract.cu:3-7
+// FIXME: This test fails. The comment below describes broken behavior.
+//        The front end should generate IR for the semantics it expects and
+//        backends should respect the IR. Backends should never "disregard"
+//        elements of the IR.
+
----------------
andrew.w.kaylor wrote:
> tra wrote:
> > We do need to have a way to preserve current behavior for CUDA compilation. There are many existing users that implicitly assume it.
> > 
> > 
> When you say you want to preserve "the current behavior" do you mean using "fast" as the default "fp-contract" setting, or also ignoring fp-contract-related pragmas when fp-contract=fast is used?
> 
> I certainly understand wanting fp-contract=fast to be the default behavior, but I'm puzzled by the difference in behavior that was introduced between HIP and CUDA, wherein HIP respects the pragmas but CUDA doesn't.
The idea is not to disturb the status quo. Major CUDA users are sort of used to clang being reasonably close to what NVCC does by default. What that is, exactly is not always clear. The current state of affairs has been working well enough. Changing how FP gets compiled will likely trigger a noticeable number of test failures due to both numerical differences and performance regressions. Former we have somewhat decent coverage for in tensorflow. Performance regressions would be harder to spot.

I can test the patch on our tensorflow tests and see how it fares.

If there are nontrivial failures, we will need to consider how to phase in the changes w/o causing unnecessary trouble for the users and/or give then an escape hatch option to keep things working until they can fix their code or tests.

>  puzzled by the difference in behavior that was introduced between HIP and CUDA

The details on HIP's need for fp-honor-pragma is in https://github.com/llvm/llvm-project/commit/cb08558caa3bad69213b08e6361586491232c745

For CUDA things were still working well enough with -ffp-contract=fast, so there was no need to change things.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112760/new/

https://reviews.llvm.org/D112760