[PATCH] D112760: Require 'contract' fast-math flag for FMA generation

Tue Nov 2 11:34:14 PDT 2021

andrew.w.kaylor added inline comments.

================
Comment at: clang/test/CodeGenCUDA/fp-contract.cu:3-7
+// FIXME: This test fails. The comment below describes broken behavior.
+//        The front end should generate IR for the semantics it expects and
+//        backends should respect the IR. Backends should never "disregard"
+//        elements of the IR.
+
----------------
tra wrote:
> andrew.w.kaylor wrote:
> > tra wrote:
> > > We do need to have a way to preserve current behavior for CUDA compilation. There are many existing users that implicitly assume it.
> > > 
> > > 
> > When you say you want to preserve "the current behavior" do you mean using "fast" as the default "fp-contract" setting, or also ignoring fp-contract-related pragmas when fp-contract=fast is used?
> > 
> > I certainly understand wanting fp-contract=fast to be the default behavior, but I'm puzzled by the difference in behavior that was introduced between HIP and CUDA, wherein HIP respects the pragmas but CUDA doesn't.
> The idea is not to disturb the status quo. Major CUDA users are sort of used to clang being reasonably close to what NVCC does by default. What that is, exactly is not always clear. The current state of affairs has been working well enough. Changing how FP gets compiled will likely trigger a noticeable number of test failures due to both numerical differences and performance regressions. Former we have somewhat decent coverage for in tensorflow. Performance regressions would be harder to spot.
> 
> I can test the patch on our tensorflow tests and see how it fares.
> 
> If there are nontrivial failures, we will need to consider how to phase in the changes w/o causing unnecessary trouble for the users and/or give then an escape hatch option to keep things working until they can fix their code or tests.
> 
> >  puzzled by the difference in behavior that was introduced between HIP and CUDA
> 
> The details on HIP's need for fp-honor-pragma is in https://github.com/llvm/llvm-project/commit/cb08558caa3bad69213b08e6361586491232c745
> 
> For CUDA things were still working well enough with -ffp-contract=fast, so there was no need to change things.
> 
What I'd like to understand is whether CUDA requires ignoring the pragma when fp-contract=fast is set or if it just needs to use fp-contract=fast by default and doesn't mind that the pragma is ignored. I understand why HIP would want to honor the pragma, and I'd like that to be the normal behavior of fp-contract=fast for all targets.

I see that CUDA does respect "#pragma clang fp contract(off)" as a way to disable contraction if the global setting is "fp-contract=on" (https://godbolt.org/z/4d7En36En), so I don't understand why we wouldn't want the pragma to also work with "fp-contract=fast".

Also, Zahira Ammarguellat is working on a patch to align the clang behavior and documentation (https://reviews.llvm.org/D107994). We're trying not to break the CUDA behavior in the process. Could you take a look at that patch and provide feedback? Thanks!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112760/new/

https://reviews.llvm.org/D112760