[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

Sun Jan 12 22:37:26 PST 2025

================
@@ -0,0 +1,631 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes='instcombine<no-verify-fixpoint>' < %s | FileCheck %s
+
+ at x = global double 0.000000e+00
+ at r1 = global double 0.000000e+00
+ at r2 = global double 0.000000e+00
+ at r3 = global double 0.000000e+00
+ at v = global [2 x double] zeroinitializer
+ at v1 = global [2 x double] zeroinitializer
+ at v2 = global [2 x double] zeroinitializer
+
+; div/mul/div1 in the same block.
+define void @bb_constraint_case1(double %a) {
+; CHECK-LABEL: define void @bb_constraint_case1(
+; CHECK-SAME: double [[A:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SQRT1:%.*]] = call reassoc double @llvm.sqrt.f64(double [[A]])
+; CHECK-NEXT:    [[TMP0:%.*]] = fdiv reassoc double 1.000000e+00, [[A]]
+; CHECK-NEXT:    [[DIV:%.*]] = fmul reassoc ninf arcp double [[TMP0]], [[SQRT1]]
----------------
andykaylor wrote:

I hate to drag this out further, but this is not quite right. Because this transformation depends on rearranging  multiple instructions, the fast-math flags kept should be the intersection of the fast-math flags present on the incoming instructions. You can make a case that r2 is just a simplification of the original r2 instruction, but x and r1 in the output are both formed by combining the original x and r1 instructions, so the arcp flag here should be dropped. The nnan and ninf are slightly different because they can be inferred from their presence on other instructions, so you can use the union of those. Helper functions were recently introduced to let you do it like this:

```
    FastMathFlags NewFMF = FastMathFlags::intersectRewrite(Op1FMF, Op2FMF) |
                           FastMathFlags::unionValue(Op1FMF, Op2FMF);
```

https://github.com/llvm/llvm-project/pull/87474