[PATCH] D69878: Consoldiate internal denormal flushing controls

Fri Jan 10 11:09:14 PST 2020

andrew.w.kaylor added inline comments.

================
Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+
----------------
cameron.mcinally wrote:
> andrew.w.kaylor wrote:
> > arsenm wrote:
> > > andrew.w.kaylor wrote:
> > > > arsenm wrote:
> > > > > andrew.w.kaylor wrote:
> > > > > > Can you document which targets do support the option? What happens if I try to use the option on a target where it is not supported?
> > > > > I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high level LangRef description is the right place to discuss specific target handling.
> > > > > 
> > > > > Currently it won't error or anything. Code checking the denorm mode will see the f32 specific mode, even if the target in the end isn't really going to respect this.
> > > > > 
> > > > > One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. If another target had a similar runtime environment requirement, I don't think we can be sure the attribute is correct or not.
> > > > There is precedent for describing target-specific behavior in LangRef. It just doesn't seem useful to say that not all targets support the attribute without saying which ones do. We should also say what is expected if a target doesn't support the attribute. It seems reasonable for the function attribute to be silently ignored.
> > > > 
> > > > > One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits.
> > > > 
> > > > This is a point I'm interested in. I don't like the current crtfastmath.o handling. It feels almost accidental when FTZ works as expected. My understanding is we link crtfastmath.o if we find it but if not everything just goes about its business. The Intel compiler injects code into main() to explicitly set the FTZ/DAZ control modes. That obviously has problems too, but it's at least consistent and predictable. As I understand it, crtfastmath.o sets these modes from a static initializer, but I'm not sure anything is done to determine the order of that initializer relative to others.
> > > > 
> > > > How does the compiler identify entry points for AMDGPU? And does it emit code to set FTZ based on the function attribute here?
> > > The entry points are a specific calling convention. There's no real concept of main. Each kernel has an associated blob of metadata the driver uses to set up various config registers on dispatch.
> > > 
> > > I don't think specially recognizing main in the compiler is fundamentally different than having it done in a static constructor. It's still a construct not associated with any particular function or anything.
> > The problem with having it done in a static constructor is that you have no certainty of when it will be done relative to other static constructors. If it's in main you can at least say that it's after all the static constructors (assuming main is your entry point).
> Yes and no. The linker should honor static constructor priorities. But, yeah, there's no guarantee that this constructor will run before other priority 101 constructors.
> 
> The performance penalty for setting denormal flushing in main could be significant (think C++). Also, there's precedent for using static constructors, like GCC's crtfastmath.o.
Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc handles it. I just really don't like how LLVM does it. If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around.

Static initialization doesn't help for AMDGPU, and I suppose that's likely to be the case for any offload execution model. Since this patch is moving us toward a more consistent implementation I'm wondering if we can define some general rules for how this is supposed to work. Like when the function attribute will result in injected instructions setting the control flags and when it won't.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878