<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/122081>122081</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] `QNaN` check after `fsqrt` instruction is slow
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kasuga-fj
</td>
</tr>
</table>
<pre>
It looks like we are about 100% behind for the following function (where `N=10000`) on Neoverse V2.
Compilation options: `-O3 -mcpu=neoveser-v2`
```
#include <math.h>
void f(int n, double *arr, double m) {
for (int i = 0; i < n; i++) {
arr[i] = sqrt(arr[i] * m);
}
}
```
godbolt: https://godbolt.org/z/57Yqj15KP
I tried to analyze the root cause and found out that the `fcmp` instruction after `fsqrt` takes a lot of time. The `fcmp` checks if the result of `fsqrt` is `QNaN` or not, then jumps to the library function call branch if necessary. This problem happens even if the all the element in `arr` is positive, so we don't jump to branch the library function call. Avoiding this check by adding options like `-fno-honor-nan` resolved the performance gap between gcc and clang. I think we should insert a comparison instruction before the `fsqrt` instruction like gcc does.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VE2P4zYM_TXKhUggy4mdHHzwTBpgUWDaAkWBHmWZjjWRRa8kJ5j99QWV7E62RQMhsj4e-chHSsdozx6xEbsXsTuu9JJGCs1Fx-Ws18P7qqP-o_mSwBFdIjh7Qbgh6ICgO1oSFFIKtYMOR-t7GChAGhEGco5u1p9hWLxJljwItb-NGBBEJd9EeSyklFJUUqgDkIc3pCuGiPCX2gjZvtI0W6czkmaeoihbxq5_K2E9mXkR5dEzKGJYXxVbki2PSj6GbIUqrTdu6RFE-TrpNG5GUf5yv3gl28Mg1N76BF6oV-hp6RyCUK0O4WljYo6ifhGyBYAc5ANmQZRHkKJ8yZ-v4POnUC95PKP4x2Z3L1bsjhkXv4Yk1P55V7XZmyi_w0R9ZLr3_-fQZHumviOXOC9jSjMnSKiTUKfHwYbCWajTN6FOu_rvr-_F7tff78gvkILFHhKB9tp9fMOsWiBKYPQSEXQWc_E9sMhp1CnfEJUczDSLSoL1MYXlrq0eEoZ8mEOqJCR9wQgaHCWgAZKdcAN__mzBjGguEexwd45xcfnysx0befnHm37jFQXwlFiZNKKH92WaIwfBeGe7oMPHZ8EZ7Rx0QXszsg-PBmPU4YN52AhzoM7hBKOeZ_QR8Ir-OxdG8owOJ2SZPbNgne6UZoo22Ssyk0jcED15oeqUKTGjh9__JbaBXH_cIYnZ5FxA9wG6z5uPmr83HFf94Gk9kqew9tozjYCR3JVFHBFmDAOFSXuDcNYzdJhuiB7OxmQpjdP-vIEv7MxfmHAcaXE9q4ghgQZD06yDjeR_UrbDgQL-0P6HLE9XMkV21BPGzapvyv5QHvQKm6Iuq1Lt67Jejc3BbIdOa1lJWap-qMudqYeiKyuzV_2g9co2SqqdLOReHkpV1BtpZDEMh-0OscNtsRVbiZO2buPcdeLiXtkYF2wKpeS-WDndoYv5IVPK4w3yqVCK37XQMGjdLecottLZmOKnmWSTyy9g2wYzVtvciZ9Vd9fmvzX-nAQbITq6rZbgmn81o03j0m0MTUKd2OVjWs-B3tEkoU6ZaBTq9Ijk2qh_AgAA___pRMkO">