<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/105654>105654</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] FMAX regression
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sjoerdmeijer
</td>
</tr>
</table>
<pre>
We have picked up a regression for this TSVC kernel:
```
float s3111(float *a)
{
float sum = 0.;
for (int i = 0; i < 32000; i++) {
if (a[i] > (float)0.) {
sum += a[i];
}
}
return sum;
}
```
where we were generating FMAX instructions with clang18:
```
.LBB0_1:
ldp q3, q4, [x8, #-16]
add x8, x8, #32
subs x9, x9, #8
fmaxnm v4.4s, v4.4s, v1.4s
fmaxnm v3.4s, v3.4s, v1.4s
fadd v2.4s, v4.4s, v2.4s
fadd v0.4s, v3.4s, v0.4s
b.ne .LBB0_1
```
But we are no longer generating this and expand this to FCMP + BIT:
```
.LBB0_1:
ldp q2, q3, [x8, #-16]
add x8, x8, #32
subs x9, x9, #8
fcmgt v5.4s, v3.4s, #0.0
fadd v3.4s, v3.4s, v1.4s
fcmgt v4.4s, v2.4s, #0.0
fadd v2.4s, v2.4s, v0.4s
bit v1.16b, v3.16b, v5.16b
bit v0.16b, v2.16b, v4.16b
b.ne .LBB0_1
```
Here's a link to compiler explorer: https://godbolt.org/z/TEae3T9h7
That shows that the IR is now different: the problem, I think, is that we used to generate an easy to recognise FMAX IR sequence of FCMP -> select:
```
%10 = fcmp fast ole <4 x float> %8, <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
%11 = fcmp fast ole <4 x float> %9, <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
%12 = select <4 x i1> %10, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> %8
%13 = select <4 x i1> %11, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> %9
%14 = fadd fast <4 x float> %12, %4
%15 = fadd fast <4 x float> %13, %5
```
but now with trunk, this is not so easy anymore as we would need to look through a phi node:
```
%4 = phi <4 x float> [ zeroinitializer, %1 ], [ %14, %2 ]
..
%10 = fcmp fast ogt <4 x float> %8, zeroinitializer
%12 = fadd fast <4 x float> %8, %4
%14 = select <4 x i1> %10, <4 x float> %12, <4 x float> %4
..
```
This change in IR was introduced by: 09eb9f1136c8572c4c3f2ec46be50899c32fc881
```
[InstCombine] Fix for folding `select` into floating point binary operators. (#83200)
Folding a `select` into a floating point binary operators can only be
done if the result is preserved for both case. In particular, if the
other operand of the `select` can be a NaN, then the transformation
won't preserve the result value.
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV91u47wRfRr6ZhCBIiVbuvBFnKzRAN1FsQ3a3hWUNLa4kUkvSTnJPn0xlOz4b7MpWuz3CYZMiWeGhzzD4Uh5r9cGcc7yBcvvJ6oPrXVz_82iazaov6GbVLZ5nf8ToVU7hK2un7CBfgsKHK4deq-tgZV1EFrt4fHv_7iDJ3QGOyZvGb9nfH-f8vEXH1edVQG8TNOUiWJ4YuJWMVGO-NliaAAAjOh-A0zeA0-YPOqMAOuAiUKbAHrAMLmIzTuQgvPhkYlF_JVw4n1_6RX5UCxfaJbfA5OfYM-NiZInPzWkK5ITCxp77-GCJV1sdn_68uSFw9A7Q84Oxof-sxUc7s8tOoRn-jmENRp0KmizhuXn23-BNj64vg7aGg_POrRQd8qs0-J9cZK_Lhb83-kBtKfaNdv4_10ycQffM7qzfPFSxIaQN-mUpj3YqKaJ4KH3gJHi1KfvKx9hZYSVI6w403ejXswGdlmSeUK8NVJqXMPCTu4x8h0w0dyJC7fiOhIAdvzCLb8AV4lB2C_jz9Vb9IG0Uw7BWOisWaM7FjHuKWUawJct_cXnYGF59_lvFG2weHj8H6UUUUr5G6WsN-sAu_x8FZmQPOE_WfMPSTk4PpXwV47FOfqamDrQeOm0GinsW3lsXWDHKNmjxKGVHeFjhAB8IEj-gg6ZmHlQ0GnzRPrXdrPVHTqKi846dEzeQhvC1pPQYsnEcm2bynYhsW7NxPIHE8vHTwrlY9nOjp0_tpRYW_vsIVAztAgPX0F7MPYZGr1aoUMTyD91bZ2tOtzQbB4oGs0TNfVo_IzQe2yI4RjECMoAKv9K7xzWdm20xyE5PXwFj997NDWCXQ0xfUNJ12OHdXg_rgGYyFMec_2q3mxhpXwA2yGl_AxehiNjSOH5ELDybjhGbnjC44VMLDinvv9Xh_x0zC79GLvyj2EnIrthrfe0dDpySvlI6pTs7yF5bWQS8Yi8fJd8-qcjXx6Tz4a4oBQU4-IKPhVD5sqz0ZDe5R-wk6Nd_k5CqfoQd3csCILrh00cz5a47wN4O2xaZV431iEoH0sM23cNGBx2eGftE4TW2X7dgoJtq8HYBt_ftjSjOAuCX_DPF_ADndVGB606_QPdOJsU6DAajqhhCccOAYdTKkkOQ1zmhfXV1Yp54XzEgxPx6-UuTlV6E_e_21YHvS87srPJXdPzkZSrW2XWCNpQXn1WHrQJzjZ9jQ1Ur5S_eYlVuUpTOa2LfCbqrJYrgXU2rTDnRVnWUqzqokjfky9fPBgf7uym0gapSF7ql1h8r2zXUMXCpnxM31NOFOwwG-raWirPK22UewW7pePBOp9QjU1VApXpb9V_vC9Hp-qKW_Urx1ArA9Z0r1Dh4K6xBqnIp4PMoe-7QPG-dejR7bCJ86gsFcnKYwIPBrbKBV33nYqBOJgOvmxo0Q2DmYaOL3J6wpKGrxAUfFFfhv2FJqKCU8avrNsoqsvHMt4aJmbhQOaY4051PSZwJsikmcumlKWa4DydiUzKNJsVk3aeNVU-m06zrC7qjE9LnqlS8CmimOazuk4nei64yHghRCryWV4kXKRCpDOpVqJAqUqWcdwo3SVdt9tQATHR3vc4T3k-zbNJpyrsfPxkFKJS9ROahsnb21tXt9OMCUFfkm5OxjdVv_Ys4532wb-5Czp08Ztzb0OBRCXB2yflpHfd_Kyi0aHtq6S2GyaW5Gv8u9k6-40WXSwjT8_EcqS6m4v_BAAA__-O8Pzk">