[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Tue May 17 16:09:27 PDT 2016

hfinkel added a subscriber: hfinkel.
hfinkel added a comment.

In http://reviews.llvm.org/D20341#432461, @jlebar wrote:

> I am not sure we want this?  Although it matches nvcc, it does not match our floating-point behavior for C++ in general -- it makes us non-IEEE-whatever compliant by default.
>
> Although I agree that if we don't do this, lots of people are not going to pass -fp-contract=fast and resultantly will think that we're slower than nvcc.  There's no way to win.  :(

But people also don't expect IEEE compliance on GPUs, and also, the system default for forming FMAs has long been system specific. The default on IBM systems, for example, is generally the equivalent of -ffp-contract=fast (in both XLC and GCC).

That having been said, is this change the equivalent of -ffp-contract=fast or -ffp-contract=on? I think it is the latter and we want the former (i.e. where we let the backend be as aggressive as possible *after* inlining).

http://reviews.llvm.org/D20341