<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60816>60816</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Avoid dependent FSQRT and FDIV where possible
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          SamTebbs33
      </td>
    </tr>
</table>

<pre>
    

With the -freciprocal-math (and -funsafe-math-optimizations) the compiler can try harder to avoid dependent FSQRT and FDIV operations. For example

```
double res, res2, tmp;
void foo (double a, double b, int c, int d) {
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;

  if (d)
    res2 = a * tmp;
}
```

With -Ofast aarch64 LLVM generates:

```
foo(double, double, int, int): // @foo(double, double, int, int)
        fsqrt   d1, d0
        fmov    d2, #1.00000000
        adrp    x8, tmp
 fdiv    d2, d2, d1
        str     d2, [x8, :lo12:tmp]
        fmul d2, d2, d2
        adrp    x8, res
        str     d2, [x8, :lo12:res]
        cbz     w1, .LBB0_2
        fdiv    d0, d0, d1
 adrp    x8, res2
        str     d0, [x8, :lo12:res2]
.LBB0_2:
 ret
```

GCC at -Ofast can do:

```
foo(double, double, int, int):
        fmov    d1, 1.0e+0
        adrp    x0, .LANCHOR0
        fsqrt   d2, d0
        add     x2, x0, :lo12:.LANCHOR0
        fdiv    d0, d1, d0
        fmul    d1, d2, d0
        str     d0, [x2, 8]
        str     d1, [x0, #:lo12:.LANCHOR0]
        cbz     w1, .L1
        str     d2, [x2, 16]
.L1:
        ret
```

Notice how the expensive FSQRT and FDIV are now independent and can execute in parallel.
A write-up of the transformation can be found in the GCC commit:
http://gcc.gnu.org/g:24c49431499bcb462aeee41e027a3dac25e934b3

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVluP2zYT_TX0y8ACL7JsPfhBXsPfVyBN0CRIHxeUOLLYSqJKUrub_PqCtHzd9aYFahgeypwzl8MjjaRzet8jrsliQxbbmRx9Y-z6i-y-Ylk6IWalUd_XhG4JLQ6_v2vfgG8Q5rXFSg_WVLKdd9I3QPhK9grm9dg7WWP8c24Grzv9Q3ptekd4HrGV6QbdooVK9uDtd2ikVWjBG5BPRitQOGCvsPew-_Lb568Q4u62v3wDM6A9xEpgZyzgi-yGFi8rJBmdvvFSmbFsESw6wh-C4cH6biBic_CIGWtjQgOTtww-07oMa917qI4LFfogywkOIRgQsQWWUCB8B4-P5ahbr_tH95f1kRfC81M-CFVEQATy4qqao4-uYz0BOP0TcTwC5WvYcvtm-xenNv9US-dBSls1WQofPnz7FfbYB0LREVG8Q2JtzImcMzUTHyeTE1EEAgIHJKX_FHRqL3zqSBmAYhFCb3Y78xSsiodIuGAJnT7XjlLZIdiX1fG0D9u10hcBpl92jXXewkWSxeYQhIiiNYwTUYRwi-1tZWN7HZW_V1GQ479LGhC3SavyR7TPkazkw2ZDH2_SnhqmE5-XDb-qid8pit4vip-qOuY_Kgks-nc0-b-HB5D-qMrwJFDmv1LhHdVEmlhCkfDNPcHQA5XFx4f_f_p8K7-jOPlb4pRKRfsSdw-BzlTdC3l9PndUP7bn-t9O_vqootvqlWZOjuzoSKd76a1af6a4n946ccGyC5GwV-fzvk4-Gq8rhMY8x9mBLwP2Tj_h7WiQFqE3z6D78_QIm0FZ-ILV6BF0D4O0sm2xTQ7RC3i22uN8HMDUMYG3sne1sV2cMxFeItRm7FXAB5cg3cp0nfanXhrvh3ARn377qkr2_ZgYuw9XRBQ8rdI8FSzN87Iq04xLREwZUr6UQsmKLzAXaSkOwWZqLVQucjnDNcuW2SLLWL6cNWumhFqWdCWzOisznqaK5YxLxcqa8VytZnrNKReUs4xlab5YJlSoVJSrVPJaIS45SSl2UrdJ2z51ocCZdm7EdUZXLJu1ssTWxdcBzktZ_Ym9IqIoijgyCA-nSTjvtHOormZ72FtsZ3Yd4s7Lce9ISlvtvDtn8tq38V3jGG-xheL9ef_coEUYjHO6bHE22nYdmHZnqrVvxjKpTEf4LmSazHyw5g-sPOG72KAjfBd7_DsAAP__WYVsRg">