[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 9 14:09:54 PST 2020


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:568-569
+    // is enabled.
+    if (!HasCorrectlyRoundedDivideSqrt)
+      return false;
+    SafeFast = false;
----------------
The attribute should not be considered at all. Only the fpmath metadata matters. If -cl-fp32-correctly-rounded-divide-sqrt is specified, the regular fdiv instruction should behave correctly.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:571-575
+  } else {
+    float ULP = FPOp->getFPAccuracy();
+    if (ULP < 2.5f)
+      return false;
+  }
----------------
An intrinsic should only be introduced when the fdiv differs from the default FP environment. Here you are doing the opposite, and not even considering the denormal mode. You should be inhibiting the insertion of the fdiv.fast if denormals are enabled, not introducing a new intrinsic. You can also consider the afn fast flag and use that to ignore the denormal mode


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:628-631
+static bool hasCorrectlyRoundedDivideSqrt(const Function &F) {
+  Attribute Attr = F.getFnAttribute("correctly-rounded-divide-sqrt-fp-math");
+  return (Attr.getValueAsString() == "true");
+}
----------------
There's no need to check the attribute


================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:245
 attributes #2 = { nounwind "target-features"="+fp32-denormals" }
+attributes #3 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="true" }
 
----------------
The attribute should be removed


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293





More information about the llvm-commits mailing list