<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/60756>60756</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Loop rotation copies the loop body, making things slower in some cases
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
archo5
</td>
</tr>
</table>
<pre>
Code:
```cpp
int KeyCompare_PtrPostInc(const char* kp, const char* str)
{
for (;;)
{
char sc = *str;
char kc = *kp;
int diff = sc - kc;
if (diff)
return diff;
if (sc == 0)
return 0;
str++;
kp++;
}
}
```
Generated x64 machine code (in one case):
```asm
KeyCompare_PtrPostInc(char const*, char const*): # @KeyCompare_PtrPostInc(char const*, char const*)
movzx eax, byte ptr [rsi]
movzx ecx, byte ptr [rdi]
test al, al
je .LBB1_5
cmp al, cl
jne .LBB1_5
mov edx, 1
.LBB1_3: # =>This Inner Loop Header: Depth=1
movzx eax, byte ptr [rsi + rdx]
movzx ecx, byte ptr [rdi + rdx]
test al, al
je .LBB1_5
inc rdx
cmp al, cl
je .LBB1_3
.LBB1_5:
movsx ecx, cl
movsx eax, al
sub eax, ecx
ret
```
Expected: `rdx` set to 0 instead of the first block.
* [Godbolt](https://godbolt.org/z/4T5P8h8r1) (3 basic `strcmp` implementations - also shows that other compilers have similar issues)
* [Benchmark](https://quick-bench.com/q/HL7RgVzc__cwZQJnEn2VMBtrhfQ) (see `KeyCompare_PtrPostInc`)
Not entirely sure what's up with the benchmark though, it can get slower even if the loop body doesn't appear to be copied.
Possibly more branches making prediction more difficult?
Also there seems to be a redundant `mov` in the beginning of `KeyCompare_I` (if using `rcx` as the offset)?
In any case, it seems weird to have difficulties with building `strcmp` (which is one of the most basic functions in C) without making it worse somehow. The whole reason why I started looking into this is that it didn't make much sense to call to an external function when most compared strings were very short and most comparisons ended quickly.
Also, is it possible to disable such loop changes to get the compiler to output a specific loop structure/block order, while keeping all other optimizations (like constant folding and mem2reg)? These changes are applied even to `goto`-based loops so that wasn't enough to make them go away.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycVl1z4joS_TXipSuUkfkIDzyEZLg3u7Nbc7em7sO-TMlygzTYklcthzC_fqslQ4AkW7U35cJE7m4fnXO6hSKyO4e4ErO1mD2NVB-NDysVtPGzUeXr4-rR1yjKB1E8ieJBzIt86a7LK9ZF-DseH33bqYA_vsXwzVN8dlrIe-0dRdBGBSEfYN8J-QjXaxSDkMuh9mI9fCmWWx9AyHtRrvk6RxTLyyBRLLkOkAZRPoGQD1yufB-wPwfsu5vnDL-2220KIA13sNe3IVuGwkFXQNIVMPbBpQofZmVoXLv4NLm4yUycrPm6Xmf-3i0vnk7kPd0odIrhz9_QYVARa3idT6FV2liHoH2NDNI68PyvImSQ77RW1OaVT3VmkpOwQj4kka8XuCYMf0KWIKbFXy-VkEDrX369AgCqV46qjhGhiwHEbB3IitnAxumt53D9Pry-CI9IkeNVw2Gqua7yE_N9_HW9nvyYXT_UbQdvqfqU-tPh5zmtf0l3rBOsSX6aQ0smLbHFBvry3ViCZ-cwwFfvO_gdVY2BY56wi0aUT5NPtvwhQyDkGkL9-n8y9WlaIu4vsGadTneu-Y7OGyo_rFdecjY7m_diO3SxnXOt83pm5xYz9VUWJj_m7Pw8YPwfffbltUMdsU7SzQve1LwAwgjRQwHWUURVg99CNAhbGyhC1Xi9H19W4cEoZuvffF35JjLV8t7E2BHvTm6E3Ozyo7EPOyE3v4TcTL_Pvt2b-zARcsk9XUKlyGpGQTHotmMgtu0abNFFFa13BHegGvJAxh8IolERfDTI7dZ2tsFAYNQLAtnWNiqAJeqR3uZYhrlGp02rwv4joP_prd7fVRwy1r7lFSE3v39d_Gv35y_944c-_PuPv7kvTv75j3UMZvvHAJ8QGfrHU2J-OUv5858-ArpoAzZHoD4gHIyKQi4I-g4ONprEd3VCCtH4fmdYWRtBKwc7jECNP2AAfEEHNivUcKfxKQi1R3JCLiKorkMVWNCKR2hnsR7U--aJbNUcofUBoQrKaYMErdpbt4MuYG01E5-f85lhdd9EUW5y_gOLwQIgEGJLwzsUBKx7VysXmZPWvyQt3bCnnXWO6_vtDWPPHMbjfQs9cQQ7UidHKkrJfrsljGk-by75fHag3HE4EBJHGc8BbagZVbLFeQMWKZNc9baphze9uU7I-4Ox2oCldM4M7m89mz-ZdNs7nR1pHTyyBbic7-OJOxvh4AMhkG_R-MMYvhsW2TcIARV5BwdzhGegqAIfc433OdFFptQSvzw53PJ5X2cpW7VHaHttgNAR8s60ahq-Kwf4GjE41ZzhwcGgy7h15rjmHzDW7ZiagPCC4cjdFCIoV19GWuLdoauxhtQTzfGq5Vn6RDUxwC4bKQGqLSn-Sgwz-VEb5XaY3MG2ZS5PDctrvo9dH0EBdajt1uqcRTH0OvYBhdykkQM-8AEiH-FgbIOwR-yYMiYgjwHfRdvaX8OwEPK-sXvMZzF7ceuz2Gmr2MqAu2wlFofwjFMF5KZpLNa5uaJng-x89GJe3FWKsmAdQfK_inBQQ7Oh40bljKRVNNjCzoM6qON4VK_Kelku1QhXk_liPp3M5-VkZFbFspZFpZa6nGznhV6WZTGdLoqF1MVyOlvokV3JQpaFnExlOVlMJuO5UrKQ00lZ68liPinFtMBW2WbcNC8tT9lRmn2rebGYzUeNqrCh00_msOKgu6rfkZgWjaVIb2nRxgZX6cAOPo_dPDPoer6wDIPXo0mGGqaRdcnzqRdp1IdmdXMS2Gj6apit_NrhdtcF_xN1FHJzGtubhP6_AQAA__8MgKLC">