[PATCH] D29958: AMDGPU : Replace FMAD with FMA when denormals are enabled.

Wed Feb 15 11:43:26 PST 2017

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:1296-1298
+  SDValue fr = Subtarget->hasFP32Denormals() ? 
+               DAG.getNode(AMDGPUISD::FMAD_FTZ, DL, FltVT, fqneg, fb, fa) : 
+               DAG.getNode(ISD::FMAD, DL, FltVT, fqneg, fb, fa);	
----------------
You only need to select between the opcode. You don't need 2 getNode calls

================
Comment at: lib/Target/AMDGPU/AMDGPUInstrInfo.td:198
+def AMDGPUfmad_ftz : SDNode<"AMDGPUISD::FMAD_FTZ", SDTFPTernaryOp>;
+
+
----------------
Extra newline

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:365
     setOperationAction(ISD::FMINNUM, MVT::f16, Legal);
-    setOperationAction(ISD::FDIV, MVT::f16, Custom);

----------------
Not sure why this appears in the diff

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:500-505
+multiclass FMACPat <ValueType vt, Instruction inst> {
+  def : Pat <
+    (vt (AMDGPUfmad_ftz vt:$src0, vt:$src1, vt:$src2)),
+    (inst $src0, $src1, $src2)
+  >;
+}
----------------
You should be using the exact fmad pattern above. You should refactor the class to take the input node

================
Comment at: test/CodeGen/AMDGPU/udiv.ll:4-5
 ; RUN: llc -march=r600 -mcpu=redwood < %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+fp32-denormals < %s | FileCheck -check-prefix=GCN -check-prefix=DENORM %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=-fp32-denormals < %s | FileCheck -check-prefix=GCN -check-prefix=NODENORM %s

----------------
You should only need one new run line, since the default is already tested. You should change the existing VI runline to explicitly set off. The r600 run line should also be last

================
Comment at: test/CodeGen/AMDGPU/udiv.ll:192-194
+}
+
+attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
----------------
You don't need both functions since you are using the global -mattr to do this. These also should have the same result with and without denormals

================
Comment at: test/CodeGen/AMDGPU/udiv.ll:194
+
+attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
----------------
You should minimize this list

Repository:
  rL LLVM

https://reviews.llvm.org/D29958