[PATCH] D84056: [FPEnv] Don't transform FSUB(-0, X) -> FNEG(X) in SelectionDAGBuilder.

Thu Jul 23 07:58:15 PDT 2020

cameron.mcinally added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9274-9275
+  case ISD::FSUB: {
+    // FSUB(-0.0,X) can be lowered or combined as a bit operation.
+    // Need to check its input recursively to handle.
+    ConstantFPSDNode *N0C = isConstOrConstSplatFP(Op.getOperand(0), true);
----------------
arsenm wrote:
> cameron.mcinally wrote:
> > cameron.mcinally wrote:
> > > arsenm wrote:
> > > > cameron.mcinally wrote:
> > > > > cameron.mcinally wrote:
> > > > > > arsenm wrote:
> > > > > > > I thought the point was this cannot be lowered this way?
> > > > > > This isn't as bad as it seems, I think. This is `isCanonicalized(...)`, so we're just saying the the FSUB(-0,X) *might* not flush denormals (if DAGCombine wants to change it to an FNEG(X) later). 
> > > > > > 
> > > > > > We could check the DenormalMode here though. That would let us know what will happen to the FSUB(-0,X) in DAGCombine. Assuming it's not a can of worms, I could do that. If it is a can-of-worms, might be better left for separate patch. It should be functional with this patch, just not optimal.
> > > > > > 
> > > > > > 
> > > > > > We could check the DenormalMode here though. 
> > > > > 
> > > > > Guarding this code based on denormal mode is easy enough, but it affects the assembly in a number of tests where denormals are enabled. That will make this patch harder to review. It might be better to optimize in a stand-alone Diff.
> > > > > 
> > > > > Thoughts on this?
> > > > I think checking for denormals isn't quite correct; canonicalized doesn't just mean denormals are flushed. It also means at minimum signaling nans are quieted
> > > That's true, but there is more nuance here...
> > > 
> > > For the general FSUB case, the result will be canonicalized. No problem there.
> > > 
> > > For the FSUB(-0,X) case, the result may or may not be canonicalized, depending on if the FSUB is replaced with a FNEG or not. And DAGCombine will only do that FNEG transform if in a particular denormal mode.
> > > 
> > > So, as long as both this check and the decision to transform FSUB(-0,X)->FNEG(X) are in lockstep, it's not really a problem. But, clearly, this is somewhat brittle and error prone.
> > > 
> > > Being a little pessimistic in order to more forward isn't the worst thing, assuming we don't regress too much. Figuring out a good solution to this subproblem will be easier (for me at least) if we can isolate it from the larger project.
> > This is the test that's tripping:
> > 
> > ```
> > ; GCN-LABEL: {{^}}v_test_canonicalize_fneg_fabs_var_f32:
> > ; GFX678: v_mul_f32_e64 [[REG:v[0-9]+]], -1.0, |{{v[0-9]+}}|
> > ; GFX9: v_max_f32_e64 [[REG:v[0-9]+]], -|{{v[0-9]+}}|, -|{{v[0-9]+}}|
> > ; GCN: {{flat|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[REG]]
> > define amdgpu_kernel void @v_test_canonicalize_fneg_fabs_var_f32(float addrspace(1)* %out) #1 {
> >   %val = load float, float addrspace(1)* %out
> >   %val.fabs = call float @llvm.fabs.f32(float %val)
> >   %val.fabs.fneg = fsub float -0.0, %val.fabs
> >   %canonicalized = call float @llvm.canonicalize.f32(float %val.fabs.fneg)
> >   store float %canonicalized, float addrspace(1)* %out
> >   ret void
> > }
> > 
> > attributes #1 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }
> > ```
> > 
> > We need a way for `isCanonicalized(...)` to return that an FSUB(-0,X) might not canonicalize if it is combined into an FNEG. Otherwise, the `@llvm.canonicalize.f32` will be removed and we end up with:
> > 
> > ```
> >         flat_load_dword v2, v[0:1]
> >         s_waitcnt vmcnt(0) lgkmcnt(0)
> >         v_or_b32_e32 v2, 0x80000000, v2
> >         flat_store_dword v[0:1], v2
> > ```
> > 
> > Maybe we need a `isFSUBtoFNEGLegal(...)` helper function? That way the check is uniform across all uses?
> Basically none of these tests you're running into are intended to use fsub. They're all fnegs that weren't ported since that's what the source modifiers are intended to directly match. Trying to do anything smarter with folding fsub's is an optimization beyond what this intended to check. I think we should just start from the position that it is not legal to lower fsub -0, x to fneg and work from there
> Basically none of these tests you're running into are intended to use fsub. They're all fnegs that weren't ported since that's what the source modifiers are intended to directly match. Trying to do anything smarter with folding fsub's is an optimization beyond what this intended to check.

Ok, that's fair. To reiterate, the tests should be updated to use FNEG. I'll prepare that patch for you to check out. I apologize for putting that burden on you, but I have very little intuition about the AMDGPU backend, so I want to change as little as possible.

> I think we should just start from the position that it is not legal to lower fsub -0, x to fneg and work from there

I think this is the right thing to do, assuming we don't impact out-of-tree targets that might still generate the old FSUB(-0,X) idiom. 

Thinking some more, I'm fairly sure there are bugs in opt that mistakenly generate the old FSUB(-0,X) pattern. So those will need to be sorted out too. And I'm not sure we can get away with bulk updating all the llc tests for other targets to use FNEG over FSUB(-0,X).

Threading a needle....

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84056/new/

https://reviews.llvm.org/D84056