[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

Mon Jan 13 15:50:55 PST 2025

================
@@ -0,0 +1,631 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes='instcombine<no-verify-fixpoint>' < %s | FileCheck %s
+
+ at x = global double 0.000000e+00
+ at r1 = global double 0.000000e+00
+ at r2 = global double 0.000000e+00
+ at r3 = global double 0.000000e+00
+ at v = global [2 x double] zeroinitializer
+ at v1 = global [2 x double] zeroinitializer
+ at v2 = global [2 x double] zeroinitializer
+
+; div/mul/div1 in the same block.
+define void @bb_constraint_case1(double %a) {
+; CHECK-LABEL: define void @bb_constraint_case1(
+; CHECK-SAME: double [[A:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SQRT1:%.*]] = call reassoc double @llvm.sqrt.f64(double [[A]])
+; CHECK-NEXT:    [[TMP0:%.*]] = fdiv reassoc double 1.000000e+00, [[A]]
+; CHECK-NEXT:    [[DIV:%.*]] = fmul reassoc ninf arcp double [[TMP0]], [[SQRT1]]
----------------
andykaylor wrote:

I see what you are saying, and I do think I was probably oversimplifying this in my mind, but I think it works out as I suggested. Because the incoming values of x, r1, and r2 are being mutually transformed to create the outgoing x and r1 values, the rewrite fast-math flags must be intersected, but the value flags require further analysis. You are checking for the ninf flag on the incoming x value (1/sqrt(a)). I think we can conclude that since 1/sqrt(a) does not have infinite inputs or result then 1/a doesn't either, so the ninf flag can be applied to the outgoing r1 value. The nnan flag can be transferred by similar logic. You've also checked for the nnan and ninf flags on the incoming sqrt(a) instruction, so r2 after the transformation must meet the conditions for nnan and ninf as well.

Since outgoing r1 and r2 both meet the conditions for nnan and ninf, then the outgoing x value can also have these flags applied. So the nnan and ninf flags can be set using the unionValue() function as I suggested. I don't think the reasoning is quite as clear for nsz, since it means that the sign can be ignored and not that the -0.0 will not be seen, but I don't think it matters for this transformation since the existing checks would make a must be positive and any zero appearing would thus be positive zero.

https://github.com/llvm/llvm-project/pull/87474