<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/135486>135486</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [LLVM] APInt::tcAdd has quiet poor codegen.
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Ralender
      </td>
    </tr>
</table>

<pre>
    Here is an example of 1 iteration of the loop
Compiled with clang trunk -O3
```asm
; start of iteration
      89: 4c 8b 14 cf                   movq    (%rdi,%rcx,8), %r10 ; load dst[i]
 8d: 4c 8b 0c ce                        movq    (%rsi,%rcx,8), %r9 ; load rhs[i]
 91: 48 85 d2                           testq   %rdx, %rdx ; test if we have a carry
      94: 74 1a                         je      0xb0 <_ZN4llvm5APInt5tcAddEPmPKmmj+0xb0>

      96: 4f 8d 4c 0a 01        leaq    0x1(%r10,%r9), %r9 ; add with a carry
      9b: 4d 39 d1      cmpq    %r10, %r9
      9e: 41 0f 96 c2   setbe   %r10b ; put carry of next iteration in r10b
      a2: eb 13         jmp     0xb7 <_ZN4llvm5APInt5tcAddEPmPKmmj+0xb7>

      b0: 4d 01 d1                      addq    %r10, %r9 ; add without carry
      b3: 41 0f 92 c2                   setb    %r10b ; put carry of next iteration in r10b

      b7: 4c 89 0c cf                   movq    %r9, (%rdi,%rcx,8) ; write result back to memory
; next iteration unrolled
      bb: 4c 8b 4c cf 08                movq    0x8(%rdi,%rcx,8), %r9
      c0: 48 8b 54 ce 08        movq    0x8(%rsi,%rcx,8), %rdx
      c5: 45 84 d2      testb   %r10b, %r10b
; ...
```
Just as a reference here is What I think optimal x86-64 code looks like
```
; start of iteration
; carry is in CF, 
    12a1: 4a 8b 4c ce f0                movq    -0x10(%rsi,%r9,8), %rcx ; load rhs[i]
    12a6: 4a 11 4c cf f0                adcq    %rcx, -0x10(%rdi,%r9,8) ; lhs[i] = lhs[i] + rhs[i] + carry
; next iteration unrolled
    12ab: 4a 8b 4c ce e8                movq    -0x18(%rsi,%r9,8), %rcx
    12b0: 4a 11 4c cf e8                adcq    %rcx, -0x18(%rdi,%r9,8)
;...
```

Stuff that could be done better:
 - one side of the branch detects overflow with a cmp instead of CF
 - The branch doesn't get eliminated.
 - adc is not used.

I made a benchmark to get an idea of how big the impact is.
 APInt::tcAdd       : 1728.4 ns for 4096 iterations in average
 optimized code     : 1045.6 ns for 4096 iterations in average

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVk2P2zYQ_TX0ZbAGRVGydPDB68Ro2rQN2qIFeikocmwxK4kKSa2d_vqClG05_kAWFRZYS-K8N_M480ThnN51iEuSPZPs3UwMvjZ2-ZtosFNoZ5VRX5c_oEXQDkQHeBBt3yCYLSSgPVrhtenCra8RGmN6Qldr0_a6QQV77WuQjeh24O3QvcDTrymhK5LT8U-4Ntymz-C8sD7AnDEJXUG8ipKkK-ASigoSDnILNxehZWtevxBaElYQllmlCVuHH_JA2LogrCRsDeFBQiHwNUYoUM6T7FmT7F0gK9TEQyVIfAOPe8hTTjS2dhc0ZRJpCigyUOyWY6Tx6PzIk1l1OIGqQ0QNL0FvYY9Qi1cEAVJY-_WsWMkDxYJDIu7ijxSfkdCSHqogyPqfv3_hTfPaZqtPHzqfeblS6v2n9tNPbfuZsOewjKTvw2adSfJYxxYKFUSjAmgSYBsUXyJwctQooUeNyht5hDr2yE0FVQRXkJagIqxs-5MeEXBEmQIwBiRAt1DmIFmIcegrPAdVkbIf_MgVmq3Dg7_oYt1BWHcGFSyAYgVJGgVr-1GxxdsUW1wpFiSMRdEkFPVgW4RSdwr9Ri5zqmGCTi_KZ6H8e9BBj_8lx8SzOM1IGWfkO7MY9zxU8GAoYwp7qz2CRTc0HiohX8AbaLE1sb6w4iqvobOmaVBNWVXT5PKYFS0eZkUPxXdNYuorSU_TWkHGgynQ4i7YYydQhwkti2gZFDzM_nHOL_ZkcqnqWPt8Pr90TEJXPw7Og3AgwOIWLXYSoT469F-18PABfK27FzC9161o4FDkTzkHaVR06BcHjX7BK9SHJhxejB2iXWiJ9SYmOZaUMDHamThpj7ClD7V_oofQ098KVl7pJQ-PrHMkzI-ESXLc7HuEQslTC8YNgUtqdU09Ep65gKTvvrllzxeZxNvT-H2_PRMmqmuJ8HF7hjxveupWojP40VQu5LgHfleOmzk408S6bjuP0NXvftiGb73wIM3QKKgQlOkQKvQeLUmjVzxBeOS0wtPJoLKikzUo9Ci9A_OKdtuY_dn92x505zwKFSLWmxHlj4tIg64jbOFhhx6w0a3uhEc1H1cKJUN_dsbD4ManhK4-QCtU-D5W2Mm6FTZ6SwAQHWiFIpDVZg-V3sU0ddsL6UG7CBudPZSUrqK9n_RMV5AsWDHn0DnYGguclvnUAXFMxCtasQtTNs6h_hfVOIJnCMqzef4miJlapqpMSzHDZbLgnPOC0WxWL8uyRIaM5lwpwVNcVHlSlGxRlZxjLtKZXjLKMsoTRoss59lcsEqWUqbJYlsIxgvCKbZCN_PwLZsbu5tp5wZcJmnGi3zWiAobF0-HjHW4h_iWMBYOi3YZgp6qYecIp4123k0wXvsmHis_fvzz5zA2N2rWwsGXQaOH3hgbxdlhN58NtlnW3vcuLGYbwjY77euhmkvTErYJDMd_T701n1F6wjYxL0fY5pj465L9FwAA___xLfoQ">