[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Justin Lebar via cfe-commits cfe-commits at lists.llvm.org
Tue May 17 18:05:41 PDT 2016

jlebar added a comment.

> But people also don't expect IEEE compliance on GPUs

Is that true?  You have a lot more experience with this than I do, but my observation of nvidia's hardware is that it's moved to add *more* IEEE compliance as it's matured.  For example, older hardware didn't support denormals, but newer chips do.  Surely that's in response to some users.

One of our goals with CUDA in clang is to make device code as similar as possible to host code.  Throwing out IEEE compliance seems counter to that goal.

I also don't see the bright line here.  Like, if we can FMA to our heart's content, where do we draw the line wrt IEEE compliance?  Do we turn on flush-denormals-to-zero by default?  Do we use approximate transcendental functions instead of the more accurate ones?  Do we assume floating point arithmetic is associative?  What is the principle that leads us to do FMAs but not these other optimizations?

In addition, CUDA != GPUs.  Maybe this is something to turn on by default for NVPTX, although I'm still pretty uncomfortable with that.  Prior art in other compilers is interesting, but I think it's notable that clang doesn't do this for any other targets (afaict?) despite the fact that gcc does.

The main argument I see for this is "nvcc does it, and people will think clang is slow if we don't".  That's maybe not a bad argument, but it makes me sad.  :(


More information about the cfe-commits mailing list