[PATCH] D137154: Adding nvvm_reflect clang builtin

Tue Nov 1 10:27:59 PDT 2022

tra added a subscriber: yaxunl.
tra added a comment.

I don't think it's a good idea. `__nvvm_reflect` is a hack that we should not propagate. The only code that currently relies on it is NVIDIA's libdevice bitcode and I'd prefer to keep it that way.

Can you elaborate on what motivates this change?

We already have a way to do conditional compilation based on the GPU architecture.
If you need the code to know whether FTZ mode is enabled or not, that should be exposed on its own. I'm not convinced that it would be a great idea, either. LLVM has a lot of ways to control FP code generation (that I'm mostly unqualified to comment on) but those options are fine-grained. `__nvvm_reflect` would only affect things module-wise and would not always tell you whether FTZ instruction variants would be used. E.g. llvm/test/CodeGen/NVPTX/math-intrins.ll shows that `ftz` can be controlled per function via `"denormal-fp-math-f32" = "preserve-sign"` attribute.

Another aspect of this is that the concept of `ftz` is not unique to NVPTX. I believe AMDGPU has `ftz` instruction variants, too. @yaxunl - FYI.

If you need to control what FP instruction variant we generate, you should probably use `#pragma clang fp ...` for that and *tell* compiler what you need. 
https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags

I do not think clang currently exposes fine-grained details about selected FP modes. We do have `__FAST_MATH__`, `__FINITE_MATH_ONLY__`, and `__FAST_RELAXED_MATH__`, but that seems to be about it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137154/new/

https://reviews.llvm.org/D137154