[PATCH] D55897: Add constrained fptrunc and fpext intrinsics

Mon Jan 28 07:31:29 PST 2019

cwabbott added inline comments.

================
Comment at: docs/LangRef.rst:14529
+      @llvm.experimental.constrained.fptrunc(<type> <value>,
+                                          metadata <exception behavior>)
+
----------------
kpn wrote:
> cwabbott wrote:
> > Hi,
> > 
> > I've been working on implementing a Vulkan extension which allows the user to specifiy different rounding modes for the AMDGPU backend. I'm not sure how this works in C/C++, but we're required to support floating-point truncation with non-standard rounding modes. Is there a reason the rounding mode isn't an argument here?
> Going back and rereading D43515, I don't see an explicit reason given back then. And I can't find anything in the C99 or IEEE 754 standards, or in the LLVM documentation, that would mandate any particular rounding mode. So I'm open to adding a rounding mode argument to the constrained fptrunc.
> 
> Andrew? What do you think of making constrained fptrunc go back to taking a rounding mode argument? 
> 
> Having said that, the constrained FP intrinsics are to avoid optimizations that change program behavior taking traps into account. Is this the behavior you need for Vulcan?
Vulkan doesn't support trapping floating-point exceptions, so we don't have to worry about that. However, my understanding is that we still need to communicate the rounding mode to LLVM, to prevent it from constant folding floating-point operations with the wrong rounding mode, so we still need to use the intrinsics.

There's also the complication that we may need to emit some "internal" floating-point operations which need to have some defined rounding mode different from what the user specified. The backend has total control over the control register that specifies the rounding mode, and in some cases (e.g. fp32 -> fp16 truncation) the rounding mode is actually specified statically by the instruction itself rather than the control register, so my thought was that we could make AMDGPU just always use the rounding specified by the argument to the constrained intrinsic, emitting changes to the control register if necessary. I'm not sure if the CodeGen infrastructure is set up to do that.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55897/new/

https://reviews.llvm.org/D55897