<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55153>55153</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Bad codegen for crc32c
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          MatzeB
      </td>
    </tr>
</table>

<pre>
    crc32c (https://github.com/htot/crc32c) for optimized CRC32 computation (with a focus on X86 / SSE 4.2). Variants of the code are used in projects like RocksDB, LevelDB or Folly. The code produced by clang-12 is 2x slower than code produced by GCC. I am filing this to document my work on the issue.

The critical part of the code is a "duffs device" looking roughly like:
```
#define CRCtriplet(crc, buf, offset) \
    crc ## 0 = __builtin_ia32_crc32di(crc ## 0, *(buf ## 0 + offset)); \
    crc ## 1 = __builtin_ia32_crc32di(crc ## 1, *(buf ## 1 + offset)); \
    crc ## 2 = __builtin_ia32_crc32di(crc ## 2, *(buf ## 2 + offset));
...
    switch ( block_size ) {
        case 128:
            do {
                    CRCtriplet ( crc, next, -128 ); // jumps here for a full block of len 128
        case 127:
                    CRCtriplet ( crc, next, -127 ); // jumps here or below for the first block smaller
        case 126:
                    CRCtriplet ( crc, next, -126 );
              ... 
        case 1:
              ...
              if (...) { block_size = 128; }
        default:
           ;
       } while ( n > 0 );
   }
 ```
 
 Code produced by clang-12 looks like this:
 
```
 # switch-bb:
        movl    $1, %eax
        movq    %rax, 2016(%rsp)                # 8-byte Spill
        movl    $2, %eax
        movq    %rax, 2048(%rsp)                # 8-byte Spill
        movl    $3, %eax
        movq    %rax, 2040(%rsp)                # 8-byte Spill
        ...
        xorl    %eax, %eax
        movq    %rax, -112(%rsp)                # 8-byte Spill
        xorl    %eax, %eax
        movq    %rax, 1040(%rsp)                # 8-byte Spill
        xorl    %eax, %eax
        movq    %rax, 1032(%rsp)                # 8-byte Spill
      ...
# and for each switch-case we get this pattern:
.LBB2_11:
        movq    -128(%rsp), %rax                # 8-byte Reload
        crc32q  -1016(%rax), %rcx
        movq    -112(%rsp), %rax                # 8-byte Reload
        crc32q  -1016(%r15), %rax
        crc32q  -1016(%r10), %rdi
        movq    %rax, 1040(%rsp)                # 8-byte Spill
        movq    -104(%rsp), %rax                # 8-byte Reload
        movq    %rax, 1048(%rsp)                # 8-byte Spill
```

The code should be generated without any of those spills and reloads.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1V0uTokgQ_jV4yZCAQhQPHEad3tiI2Uv3xsbejAIKremSYqqKtp1fv1mFig-csJ1eghCBzPy-fJpmstilucojkoNHkrUxtfaiLx55wnPFzbrJ_Fxu8GZtpMFLK-uRKZRSgawN3_CfrID58zwigKJ1Y6jhsrLmtmgAKErmjQZ89G8yxsdP8PLyFUY-QSs-_EMVp5XB9yWYNUMTBQOqGDQazfIKaiW_sxwFBH9l8CzzV72YeWQO39gbE4sZII8nKcTOh78P-qhTNDnqZzvIBa1Ww5AA10DeQQu5ZQqhaHUt-8d87sOfQDdQcsGrFYqhlpFQoAsbVhnY7GAr1av1xrLlWjfM94KFF3xpPx0HxQ3PqYCaKnPmGFqjGAJSNGWpoWBvPGd4C0LKV4unZLNai53z1eahNTwO9md7S6KClbxiNuhG8VowzEyCmbFRyZrSXiTat4-n4MXzVg_wQCGEj_CEALxoActl1nBheLXkNCJLl96Ct-aOktagRxA4Qesn-mTW4dgzmt1EC-9GC3vRwg-hkbvRSC8a6UNrkXzf7yA1Vni-tqUOmcDKXGpsBnBRn8w6MceOagYhSY5ZhZOjkNcKp0eXaIe1T3XF3o29YnEncIiI61z43mxqDWuGfWT7FHuwEaLlaOtRsMpx6WM46WV4N5PJbSZIJGPYfo6SbYmSK232rPSGCsFUL6Xx71Eaw2kCz5UxndCHeQPxLPvdwW3lJPZlm_qzasBKdHnHkEwW59rYx7QRpg_ski_qwnbNBXNO4niNvroWPHOsA7iYGQcf5zeno51A-xlrh15HqXcG2UbZV_8wy674b-SbcKTJaN_OMaPvVzI_WplY4TuUIkE4RufsA13bSF4cFjMZZjvD4KXmQtzEJB_BHCWfghl9CDN4HPOqBN-l2pNw4HfTGIYheZzGo6jhbzn_OGr0sK_HgFshWhVufDGKc3_fAG5ibBmscAK5haGmxjBVHdvC_zabkWV4PVQORO0EP6W3dwzJ_4LoM85SWlwML_sb98Ma7HrJRuBoMb8Rqsti-FQGYXxm8Q6F4EQBf7L_15rqQhCMPiUEvQQfmDKXe1-3X9oprteyETjDbd1VTFGDA90u3LIxWKW7du2UWJnaWtSucpVjq_1BkUbFNJrSgeFGsHRGC2cULbnqbrf8QaNE-ou_BEK8HS7D_ZaOt24l1vgljsM4GqzTsCgnYxaM8iwblwnLWUSCskzCOCb4IEwGguJaoFMvxrWeVGzbbtX43YsXA56SgODEJJNwGofR2C-nbFTmZTYO4yLPWOSNArahXPiWhy_VaqBSRylrVhpfCq6N7l5SrfmqYszBoX3aYJRU-hc1P9ls4JBTx_w_4qR1qw">