<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54652>54652</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Avoid dependent FSQRT and FDIV where possible
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            good first issue,
            backend:AArch64,
            beginner
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ilinpv
      </td>
    </tr>
</table>

<pre>
    With the -freciprocal-math (and -funsafe-math-optimizations) the compiler can try harder to avoid dependent FSQRT and FDIV operations. For example
```
double res, res2, tmp;
void
foo (double a, double b, int c, int d)
{
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;

  if (d)
    res2 = a * tmp;
}
```
With the -freciprocal-math (and -funsafe-math-optimizations) the compiler can try harder to avoid dependent FSQRT and FDIV operations. For example
```
double res, res2, tmp;
void
foo (double a, double b, int c, int d)
{
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;

  if (d)
    res2 = a * tmp;
}
```
With -Ofast aarch64 LLVM generates:
```
foo(double, double, int, int):                             // @foo(double, double, int, int)
        fsqrt   d1, d0
        fmov    d2, #1.00000000
        adrp    x8, tmp
        fdiv    d2, d2, d1
        str     d2, [x8, :lo12:tmp]
        fmul    d2, d2, d2
        adrp    x8, res
        str     d2, [x8, :lo12:res]
        cbz     w1, .LBB0_2
        fdiv    d0, d0, d1
        adrp    x8, res2
        str     d0, [x8, :lo12:res2]
.LBB0_2:
        ret
```
GCC at -Ofast can do:
```
foo(double, double, int, int):
        fmov    d1, 1.0e+0
        adrp    x0, .LANCHOR0
        fsqrt   d2, d0
        add     x2, x0, :lo12:.LANCHOR0
        fdiv    d0, d1, d0
        fmul    d1, d2, d0
        str     d0, [x2, 8]
        str     d1, [x0, #:lo12:.LANCHOR0]
        cbz     w1, .L1
        str     d2, [x2, 16]
.L1:
        ret
```
https://godbolt.org/z/jb8a14K16
Notice how the expensive FSQRT and FDIV are now independent and can execute in parallel.
A write-up of the transformation can be found in the GCC commit:
http://gcc.gnu.org/g:24c49431499bcb462aeee41e027a3dac25e934b3
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztVsFu4zYQ_Rr5QtigKMmWDjrYMdwWTXfRbbE9BpQ4krmlRZWiHG--vkNasmPH3mzRU4EKSYYiZ4ZvHh81KbT4mv8h7ZbYLZBpZaCUrdElV9Mdx9mApbwRuNA3Ha_AT051a-VOvnArddMFLPOxpd61UoEhJW-INV_JlhuBr1YTvtdSEAEtNAIaSza__frpd-LybtY_fSa6BXPMNSMbbQgc-K5VENB1QJfBnA4__lXovlBADOC-D84wZ-2uDaLV0cPtdRxVWjv8Qwh3jsO4cGOJSMpxILCMYcPFkIi4tCSI1iScUUy0IU9PRS-Vlc1T95exnhsXFp0CEI8P8IFseYFr9JGVB3XajxAfx3wgfxu2WN8k4v8z-4-e2fRjxTtLODfldh6Tx8fPv5AaGkcnEhQtb0YiLSdWzpwMRJwMluWh3X2QEMdJENPvTXgq1z2Vp5AQEfoQerW603tnhT_egEV4BMNz6ciFaZ09pKMOLvMI-SrP8De89OmsIa_2SlbHXFi_0iFD47Im62uAvXqTmH0Lm5PsP9vXRVzvWxYv3j572maPqxV9YndqpgOzN2p-A43dwUbvY2MncCOMaHmZxYC9qcAfHh4It6N63fdC6H-v1jsK8kShfDBmdU889Ejm8sPDjx8_XUtxFCq7JVQuhLcHv3pMdGbpXsrLE7pzAwaBha8ERt87Je-WvlHNyTEcHelwr25hfU9z794fPwjnr_QRfrc2tta2_tPlPy-1FoVWdqZNjW8v-PulSHkY_4zZvfsHbWUJZKuffROCA_aYTu7husdwA6RBJ9mc25BbdOKDA5S9BVwjLTdcKVCzY_YleTbSwrRvia78Btbwpqu02fmG5cMLIJXuMRfGOxenbmyGO2lPRbuizjWV5axu-qGmGudZXMZZHIVxlhVlEc8ZB4A4BMoWPBK8ZAlkUVxEE5FHIosyPrHSKsiR6-XSf_qRarL8dpd93gJy0Oquk3iBJr1R-RXX2FH6YobQ8UWp_Wim-P_AFyjxlm1k1_Wu826SeJ6wyTZPs0oIKEoRR4KWIa-KLEniMF2kabxgSTJRvADVOaQBY7XWglTS4KX3mXDqKENW8PJPhI1YxoLOS1DLBluam0nWE5kzyhiNIhouYmwHs5TOM4rxCcuybMELbEiw41LNHHZH8sTkvoyirztcVLKz3XmRIx91A55Ml5_3dqtNLpVs2v3Ew8x9tX8DBXP5Rw">