<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60756>60756</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Loop rotation copies the loop body, making things slower in some cases
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          archo5
      </td>
    </tr>
</table>

<pre>
    Code:
```cpp
int KeyCompare_PtrPostInc(const char* kp, const char* str)
{
        for (;;)
        {
                char sc = *str;
                char kc = *kp;
                int diff = sc - kc;
                if (diff)
                        return diff;
                if (sc == 0)
                        return 0;
                str++;
                kp++;
        }
}
```

Generated x64 machine code (in one case):
```asm
KeyCompare_PtrPostInc(char const*, char const*):        # @KeyCompare_PtrPostInc(char const*, char const*)
 movzx   eax, byte ptr [rsi]
        movzx   ecx, byte ptr [rdi]
 test    al, al
        je      .LBB1_5
        cmp     al, cl
 jne     .LBB1_5
        mov     edx, 1
.LBB1_3: # =>This Inner Loop Header: Depth=1
        movzx   eax, byte ptr [rsi + rdx]
        movzx   ecx, byte ptr [rdi + rdx]
        test al, al
        je      .LBB1_5
        inc     rdx
        cmp al, cl
        je      .LBB1_3
.LBB1_5:
        movsx   ecx, cl
 movsx   eax, al
        sub     eax, ecx
 ret
```

Expected: `rdx` set to 0 instead of the first block.

* [Godbolt](https://godbolt.org/z/4T5P8h8r1) (3 basic `strcmp` implementations - also shows that other compilers have similar issues)
* [Benchmark](https://quick-bench.com/q/HL7RgVzc__cwZQJnEn2VMBtrhfQ) (see `KeyCompare_PtrPostInc`)

Not entirely sure what's up with the benchmark though, it can get slower even if the loop body doesn't appear to be copied.
Possibly more branches making prediction more difficult?
Also there seems to be a redundant `mov` in the beginning of `KeyCompare_I` (if using `rcx` as the offset)?

In any case, it seems weird to have difficulties with building `strcmp` (which is one of the most basic functions in C) without making it worse somehow. The whole reason why I started looking into this is that it didn't make much sense to call to an external function when most compared strings were very short and most comparisons ended quickly.

Also, is it possible to disable such loop changes to get the compiler to output a specific loop structure/block order, while keeping all other optimizations (like constant folding and mem2reg)? These changes are applied even to `goto`-based loops so that wasn't enough to make them go away.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycVl1z4joS_TXipSuUkfkIDzyEZLg3u7Nbc7em7sO-TMlygzTYklcthzC_fqslQ4AkW7U35cJE7m4fnXO6hSKyO4e4ErO1mD2NVB-NDysVtPGzUeXr4-rR1yjKB1E8ieJBzIt86a7LK9ZF-DseH33bqYA_vsXwzVN8dlrIe-0dRdBGBSEfYN8J-QjXaxSDkMuh9mI9fCmWWx9AyHtRrvk6RxTLyyBRLLkOkAZRPoGQD1yufB-wPwfsu5vnDL-2220KIA13sNe3IVuGwkFXQNIVMPbBpQofZmVoXLv4NLm4yUycrPm6Xmf-3i0vnk7kPd0odIrhz9_QYVARa3idT6FV2liHoH2NDNI68PyvImSQ77RW1OaVT3VmkpOwQj4kka8XuCYMf0KWIKbFXy-VkEDrX369AgCqV46qjhGhiwHEbB3IitnAxumt53D9Pry-CI9IkeNVw2Gqua7yE_N9_HW9nvyYXT_UbQdvqfqU-tPh5zmtf0l3rBOsSX6aQ0smLbHFBvry3ViCZ-cwwFfvO_gdVY2BY56wi0aUT5NPtvwhQyDkGkL9-n8y9WlaIu4vsGadTneu-Y7OGyo_rFdecjY7m_diO3SxnXOt83pm5xYz9VUWJj_m7Pw8YPwfffbltUMdsU7SzQve1LwAwgjRQwHWUURVg99CNAhbGyhC1Xi9H19W4cEoZuvffF35JjLV8t7E2BHvTm6E3Ozyo7EPOyE3v4TcTL_Pvt2b-zARcsk9XUKlyGpGQTHotmMgtu0abNFFFa13BHegGvJAxh8IolERfDTI7dZ2tsFAYNQLAtnWNiqAJeqR3uZYhrlGp02rwv4joP_prd7fVRwy1r7lFSE3v39d_Gv35y_944c-_PuPv7kvTv75j3UMZvvHAJ8QGfrHU2J-OUv5858-ArpoAzZHoD4gHIyKQi4I-g4ONprEd3VCCtH4fmdYWRtBKwc7jECNP2AAfEEHNivUcKfxKQi1R3JCLiKorkMVWNCKR2hnsR7U--aJbNUcofUBoQrKaYMErdpbt4MuYG01E5-f85lhdd9EUW5y_gOLwQIgEGJLwzsUBKx7VysXmZPWvyQt3bCnnXWO6_vtDWPPHMbjfQs9cQQ7UidHKkrJfrsljGk-by75fHag3HE4EBJHGc8BbagZVbLFeQMWKZNc9baphze9uU7I-4Ox2oCldM4M7m89mz-ZdNs7nR1pHTyyBbic7-OJOxvh4AMhkG_R-MMYvhsW2TcIARV5BwdzhGegqAIfc433OdFFptQSvzw53PJ5X2cpW7VHaHttgNAR8s60ahq-Kwf4GjE41ZzhwcGgy7h15rjmHzDW7ZiagPCC4cjdFCIoV19GWuLdoauxhtQTzfGq5Vn6RDUxwC4bKQGqLSn-Sgwz-VEb5XaY3MG2ZS5PDctrvo9dH0EBdajt1uqcRTH0OvYBhdykkQM-8AEiH-FgbIOwR-yYMiYgjwHfRdvaX8OwEPK-sXvMZzF7ceuz2Gmr2MqAu2wlFofwjFMF5KZpLNa5uaJng-x89GJe3FWKsmAdQfK_inBQQ7Oh40bljKRVNNjCzoM6qON4VK_Kelku1QhXk_liPp3M5-VkZFbFspZFpZa6nGznhV6WZTGdLoqF1MVyOlvokV3JQpaFnExlOVlMJuO5UrKQ00lZ68liPinFtMBW2WbcNC8tT9lRmn2rebGYzUeNqrCh00_msOKgu6rfkZgWjaVIb2nRxgZX6cAOPo_dPDPoer6wDIPXo0mGGqaRdcnzqRdp1IdmdXMS2Gj6apit_NrhdtcF_xN1FHJzGtubhP6_AQAA__8MgKLC">