<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55202>55202</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Missed Optimization - Replacement of rint/lrint with X87/SSE specific instructions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Hendiadyoin1
</td>
</tr>
</table>
<pre>
X87 and SSE have simple rounding and converting store instructions, which are essentially equivalent to `l{0,2}rint[fl]?`
Clang/LLVM does not seem to replace calls to `rint` with these, and neither does it vectorise these when used to round/convert vectors in all cases.
(truncation is properly replaced)
Some examples follow below
GCC is listed aswell,
The main difference to them is, that they do schedule their `fldcw` for truncation earlier and replace `rintl`, as well as use some bit-magic for `rintf`
Note: Using `f32x4` for `float __vector(4)` and `i32x4` for `int __vector(4)`
Note: `cvtss2si` != `cvttss2si`
Note: Assuming Overflows etc are UB, and HW's behaviour is acceptable
Scenario |LLVM |GCC |Effective instruction(s)
---------------------------|------------------------------- |------------------------------- |-------------------
`rintl` | `call rintl@PLT` | `frndint` | `frndint`
`(int)rintl` | `call rintl@PLT` +truncation| `call rintl@PLT` +truncation| `fistp m16/m32/m64`
`lrintl` | `call lrintl` | `call lrintl` | `fistp m16/m32/m64`
|||
`lrint` | `call lrintl` | `call lrintl` | `cvtss2si r32/r64, xmmX`
`(int)rintf` | `call rintf@PLT;cvttss2si` | Bit magic+`cvttss2si` | `cvtss2si r32, xmmX`
`(int)rintf (SSE4.2)` | `roundss + cvttss2si` | `roundss + cvttss2si` | `cvtss2si r32, xmmX`
`4x lrintf (f32x4->i32x4)` | 4x (shuffle+`call lrintl`) | 4x (shuffle+`call lrintl` | `cvtps2dq xmmY, xmmX`
Tested using glodbolt and `x86_64 Clang 14.0.0` as well as `x86_64 GCC 11.2` with O2 and O3
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytVsFy4zYM_Rr5golHpiTbOfgQO0n3kG06TbbdnjKUBFnsUKKXpOykX78AJSeOu06ynWo8kiWCwMMDCCA35dPi63wGsi3h7u4KarlFcKrZaARrurZU7TosFqbdovX86ryxCKp13naFV6Z1kVjBrlZFDZJW0DlsvZJaPwF-69RWanoHbyCaxjqaLWOSF9Hs0qrWR9my0lF2GSXXtBrFl1F80d9XWrbrSFzf3PzxGUqDDlrjwSE2rMriRssCoSAzbtAd9E1j2Clfg6_RIQNj9C3SJ7S9GuVhiwU5oRz2YgQeW-gclkE1-02GB5cHYdrXAhkjiw7deEAq5kRCW0imAZSDjTUbtOT4gI_0nB86dWcaIuhRMsEOKqO12UGOdD-U-mW1YmVaOU-QpNuh1uRKv3hfIzSSwJSqqtBiSywQanKkoU3ssa-l5_cn8hdcUWPZ6eCpskxTpctixzxVxsIBfJRWKyKJCdvTO7CqOTbMpQPGwk9iCxx7kyt_1si1KoK-YUN1FMxfjccouYAvjjOIQSTiMd2DCKAMgX546MkmXlNmjtYZDT3VkTzZ-IH0a2P0odh654RTvDMSkyi5HL4-f3695cK5rmGItxR6wrRzgL4Iaf1luc-mT39GYuYobHRalOksx0oWBW68zDW-CneBrbTKwL-vaLYKmf3GRSKcCe-IXFEW0DHcvjqSRIl7Tr2z0xftf2OVL_jPIgMRLxl0woEQET5ZdPWiafzbzf3RhkGwslyT_AltPxB7RkGU8BdxfgLO20gisXw5Kj8vWtFR3kAzmVJdaRLB92l6iE6_QdKRuVOiPyf2DqLZavi9AvixIP4_APdHF2xAZwkdnb_Hpvl6MqrVh6JaDaFKlod1oBddUnMIxYyCeFQo3gH4HjRKizl12HQshsJ2qCq0HOc4deBNmx8WfA9c-tjzH3CFYnwWJVd9le3xsSKS4kpSd1VFha2n5Dh6JN5b_ZD0IcCNE-U3RvbXAULoIQ6PewwtsAttY61NmRvt9z3hcT59mKYQBgWYpON4HIeO8dKkXoS4kE4mRP5-PrgVQc1tMlgalYukPE_O5cgrr3HxWTmeB243XjXqn75DnsHvfV9seKAxVUgnys3gXa-Whin6wKOU22ChKmqLh5PSqLN6UXtynboNCdJvTdu6fFyYhjXp7f5xRrPE31TZ6ZWgdEj1_DrLRCxG9SJOqjzLEyyyeZpPk3OR51k2KaaY5nOZVflISxoq3IKmq0iIFncQVNB_GrVGakFaRJwm8WSSpel8HIs4mSdpKnIxSzHP6IAgjRh6zDjGxq5HdhEg5d3a0SIPJu5lUTqn1i1iMEf6ZedrYxefkGqwLJ-MaiejYH8R8H8HrGL3zw">