<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/137983>137983</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Missed combining shr and shrx in collatz_f1()
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
BreadTom
</td>
</tr>
</table>
<pre>
See [godbolt](https://godbolt.org/z/5Wh8sG958) and [GCC bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120038).
```
#include <stdint.h>
#include <stdbool.h>
uint64_t
collatz_onlyoddstep (uint64_t oddnum){
return (3 * oddnum + 1);
}
uint64_t
collatz_oddstep (uint64_t oddnum)
{
return (3 * oddnum + 1) / 2;
}
uint64_t
collatz_div2tillodd (uint64_t num)
{
num >>= __builtin_ctzg (num);
return num;
}
uint64_t
collatz_f0 (uint64_t oddnum)
{
oddnum = collatz_onlyoddstep (oddnum);
return collatz_div2tillodd (oddnum);
}
uint64_t
collatz_f1 (uint64_t oddnum)
{
oddnum = collatz_oddstep (oddnum);
return collatz_div2tillodd (oddnum);
}
```
collatz_f1() uses shr then tzcnt then shrx.
collatz_f0() uses only tzcnt then shrx.
collatz_f0() speeds up by 10% when I tested it.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVE9vmzAU_zSPi1Vknvl74ECaUu2w0ybtGAF2wJNjR9hu13z6yYwuTZe20zTJUiLe7997htdZK0ctRA3ZBrJt1Hk3mbnezKLjX80h6g1_qr8IQSDbjIb3RjnItoDl5NzRAmsAW8B2LcVmHgHbE2CbfZtKe19lJWBFOs2DwP3tLen9eFVgGOJR-1Wg9-NJKtUBtnYyj7vej_EwSmCt5MC2CVLKgnAMtIGcroc2gEzqQXkuCLBb67jULp6A3V2r9cao30XaeKldnu4c0GYwSnXutDNaPRnOrRNHAlg-I4jhXPsDYAXFBmhDyCycn3XAMALYrAACuCFJgLEAg2L7ttO7LoH7V0YEsCX4sR2XD-ikUobzC8s__BZxdrecLdntei-Vk3o3uNMYmCuBXaQLDz-MsKcfNvvcHNuSN27kzLpM8Eafr-Dvxkv-Kd5_jvbi1T4HA1y-KW-FJXaaiZuEJu40aPfrr53mH_HFpF8ywgSvoa8R7FEIbok_kv6JJBQwI4-B84k4YZ3gRLo44jXjFau6SNRJkeYUK0zSaKq7YuhKKuiQ7jPW0YTSoUrTkucUkbG8jGSNFDOaMprkNM_SmPMyp8W-qJI9E1WfQErFoZMqVurhEDZDJK31ok5YUZUsUl0vlF0WF6IWj2SpAmLYY3MdSDe9Hy2kVEnr7FnGSadE_VlaKzgZzKGXWupxGWbYVGEkRJ6v6nnkkZ9V_WptSTf5Ph7MAbAN-uvPzXE238XgANsllQVs19gPNf4MAAD__zCvriU">