[PATCH] D71742: Added intrinsics for access to FP environment

Fri Jul 17 12:55:08 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/include/llvm/IR/Intrinsics.td:649-651
+  def int_get_fpenv     : Intrinsic<[], [llvm_ptr_ty]>;
+  def int_set_fpenv     : Intrinsic<[], [llvm_ptr_ty]>;
+  def int_reset_fpenv   : Intrinsic<[], []>;
----------------
sepavloff wrote:
> arsenm wrote:
> > sepavloff wrote:
> > > arsenm wrote:
> > > > sepavloff wrote:
> > > > > arsenm wrote:
> > > > > > Using a pointer for this is problematic, and one with a hardcoded 0 address space doubly so
> > > > > Changed type to `llvm_anyptr_ty`, which must support arbitrary address spaces.
> > > > Accepting any address space is only a half-fix. Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend? This wouldn't require lowering to introduce stack usage for example for something that's usually read directly out of a register
> > > > Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend?
> > > 
> > > In this case X86 represents FP environment as `i256`. It is not clear how to legalize this scalar type. Passing FPEnv through memory allows to avoid issues in legalization. For targets that get/set environment by simple register moves the intermediate stack slot is eliminated (see D81843). Well, almost eliminated, only write to stack slot remains, but it is DCE deficiency.
> > > 
> > I'm not sure what you mean. You could legalize i256 by storing to the stack and reloading parts for all operations? This probably already works.
> I tried to implement a solution in which `get.fpenv` returns integer value. The solution seems to work, however it has its own drawbacks:
> 
> * In the case of x86 `get.fpenv` returns `i256`. This is illegal type, so we cannot set custom lowering through `setOperationAction`. `TLI.getOperationAction` always returns `Expand` as the type is extended. To cope with this difficulty, `DAGTypeLegalizer::ExpandIntegerResult` has to call `TLI.ReplaceNodeResults` directly.
> 
> * It is necessary to use special node class to represent `GET_FPENV`, otherwise `TLI.ReplaceNodeResults` has no access to the chain token argument.
> 
> * The code created for the source:
> 	```
> 	define void @func_01(i256* %fpenv) {
> 	entry:
> 	  %fpe = call i256 @llvm.get.fpenv.i256()
> 	  store i256 %fpe, i256* %fpenv
> 	  ret void
> 	}
> 	```
> uses stack variable, although it should use pointer argument directly. Optimization that would eliminate it is now absent.
> 
> This way is possible, but it requires more efforts to implement. There must be serious reasons why we prefer this way over using pointer argument.
> 
> Could you please explain the concern of using pointer in the intrinsic? What is wrong with the implementation used in D81843? Why it cannot be used by targets where FP environment is simply a content of a register?
> 
It could be, but it forces the lowering to introduce stack usage. Now an alloca has to stick around for the scratch pad until codegen, and SROA can't eliminate it. For AMDGPU for example, we change ABI lowering and other optimization strategies based on whether there is stack usage in the incoming IR. We don't really have other intrinsics that force you into this situation

I also noticed we have llvm.flt.rounds already, so I'm confused why this is different

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71742/new/

https://reviews.llvm.org/D71742