<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=http://email.email.llvm.org/c/eJzNV8ly4zYQ_RrqgpKKi9aDDpY9rpnKTHKYueTkAommCBsEWAAoL1-fblCiJVryOJlUJSwtFBto9PK6-yk34nn91ZgHxj3zlXTMybpRwAojIEqvmfSs4KpoFffgcAUw3dY5WMat5c8MFNSgvWM5-EcAzaJ5DPhmXAu6z_EdZVdRfBPFh098El5F0-yfpJnUhWoFsCi7LpwXAsoo-3S8y3nbFp7dGsOixaZ7xvAqKm7ZSzTbZNHsJsr2kmhxdL_fL8iO7KrxVsiyvPPo6Qu6uCyMdnvF6VVOPp88wSWr0yPPqLJo-A0DNmZ5fy4tjdJbfMkSb5a05prFpO7uLm-l8lLftdoCRx9yRaag7GS7Bd9azeyxX2wQxWMnrxXXW7YFDTakq5S6yyQrje3Si6e0TuIq3Dr-IyUFeCg6zZkzqvXSaJZzB4LhTW0EJt4yqXdgHVxKJHf1PshdRClyIYYYPlJ9-htdDM6duRAILJrGH1TThykkpc3Dt3WS1mJiTuW12fHcoZw_kXw8T6bLVTKdr5I0m8-SRRrPTjfIulVBYbeB9J7IMTfvZOKzeQSMWaigkn1hrS5MTZUSSmh_j0FWmCFa9CVKF4LBUwMFLcFixMITBpxGgSeU6y2EvRjskNIJ27T9qpBArLhiAADOMH1bqlYUurZpDGZWPTMFzjHTeFlz1Zf6OQRoo8cHFGBAvGyULLiXOziAghA1wAlWiFSol6MiYC2qCuq79iHkTjo6w4JDhcxVplWCVRxVvoA1-LzmUgsK3ur_jDhEFHsPID0i9_KLiCw-iMg9IMOGdy0RA7mrbC__Gxj-LmtJWS0QDKQXHTqGDMEM09Y0oN1rfwlgyv7TVr8zUiCCET_Lt636uMUeLS-N-ZdGgTszCl43UJzO2sUkbsNRuaGba9fdpJvwGhxKV-eeHA4MCkYflXOZ_VFB3x5El0bpQubeJq0vsi44_7jGDkX2MTUnbjatqwJQk-kFQf50QTDE-s8GBLZLfywflvM9kiK6Jl83m_gu-9Va_knzuFjrfS0Pi7yom16AGtML5yXT0IyT4Wazw_kS4nyugzyZ7lTIg3I4RL0LRvpO7g9XwECGpfrpB7WJLxoxyJB4NuwzcGr2qOIGGl_houSC7aLLzDDlyE9DsLqaUEZv3-CIC9Gp6Ox_431zCM7ZA-71cerTY9ezvmzeHOUaUrYcwNM0vSHnBUOkv9-jfzdeFh0xcLwObTrMdihLWUjiG3u6bspXrn4g9DiFJ8fKrpAkoCYbegInltJwKx3O6kfpK2KMHV_suAanBwQc_JowbCw0-AFq5DoaCjSC2-fQ64zFG2RBYT7QPwxTA3vAEU9GeSuLh47xEAei3fyRowmmpIMQd4E_1dK1xAmc552RyDU8uUcqwtFSd3MDnUI6tDpxbGtEbpBrINt6IKBV3jeh4QV6vpdOjEXg3L7Q-xv8mXz7rVqMxDoTq2zFR7z1lbHrLbzwMYao5nrUWrUeaMIwtfkEA4c_lNodvsaNNffI7Oi_gHMt8bXbWZYmi1G1TtPFKhF5miNw5rMSZllRprNiCcVMzJfTfKR4Dsqtce7h2Bv9-pFyncZpGifJLInTRbqcJNkiWSVlli3i2SLOObZqomBqQnooKiO7DirzdutQqKTz7lXInZNbDXCw0EuvYP39DFmg4dcYSpylduwtD-kinvm0nI_n01EwdR3s_AtlYhd7>53217</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal code gen for pointer subtraction on x86-64
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          geza-herman
      </td>
    </tr>
</table>

<pre>
    Look at this simple code, it calculates the number array elements between `e` and `b`:

```cpp
#include <cstddef>

struct Foo {
    char z[3];
};

std::ptrdiff_t size(const Foo *b, const Foo *e) {
    std::ptrdiff_t r = e - b;
    //if (r < 0) __builtin_unreachable();
    return r;

```

Clang generates fine code for this (using `-O2`), a solution based on modular inverse:

```asm
size(Foo const*, Foo const*):                       # @size(Foo const*, Foo const*)
        sub     rsi, rdi
        movabs  rax, -6148914691236517205
        imul    rax, rsi
        ret
```

However, if I uncomment the commented line, I'd expect that it doesn't change the asm code. But it does, and clang generates a larger and supposedly less optimal code, a solution based on non-modular multiplicative inverse (modular inverse still can be used, as the division result should have zero remainder):

```asm
size(Foo const*, Foo const*):                       # @size(Foo const*, Foo const*)
        mov     rax, rsi
        sub     rax, rdi
        movabs  rcx, -6148914691236517205
        mul     rcx
        mov     rax, rdx
        shr     rax
        ret
```

Similar case, suboptimal code gen happens for this code:
```cpp
#include <cstddef>

struct Foo {
    char z[3];
};

void bar(std::ptrdiff_t);

void foo(const Foo *b, const Foo *e) {
    std::ptrdiff_t s = e - b;

    for (std::ptrdiff_t i=0; i<s; i++) {
        bar(i);
    }
}
```
The generated code is this:
```asm
foo(Foo const*, Foo const*):                        # @foo(Foo const*, Foo const*)
        push    r14
        push    rbx
        push    rax
        sub     rsi, rdi
        test    rsi, rsi
        jle     .LBB0_3
        movabs  rcx, -6148914691236517205
        mov     rax, rsi
        mul     rcx
        shr     rdx
        cmp     rdx, 2
        mov     r14d, 1
        cmovge  r14, rdx
        xor     ebx, ebx
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     rdi, rbx
        call    bar(long)
        add     rbx, 1
        cmp     r14, rbx
        jne     .LBB0_2
.LBB0_3:
        add     rsp, 8
        pop     rbx
        pop     r14
        ret
```

Notice the same, less efficient number of elements calculation.

And there is a comparison with `2`, and a `cmov`. These seem unnecessary (sorry if this is some kind of trick that I'm unaware of, or if I misunderstand the intent of these instructions).

godbolt link: https://godbolt.org/z/zMeY1MKh7
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNV8ly2zgQ_RrqgpKKi9aDDpYdV1KTzBySS04ukGiKsEGABYDy8vXTDUq0REuOZ6maYVEixAZ6fb0oN-J5_dWYB8Y985V0zMm6UcAKIyBKr5n0rOCqaBX34HAHMN3WOVjGreXPDBTUoL1jOfhHAM2ieQz4YVwLWuf4ibKrKL6J4sM3vgl30TT7N2kmdaFaASzKrgvnhYAyyj4dn3LetoVnt8awaLHp3jG8iopb9hLNNlk0u4myPSVaHK335wXpkV013gpZlnceLX1BE5eF0W7POL3KyeaTN7hldSryDCuLit8wYGOW93Jpa5Te4i1LXCxpzzWLid3dXd5K5aW-a7UFjjbkilRB2slxC761mtlju9jAi8dGXiuut2wLGmwIVyl1F0lWGtuFF6W0TuIuPDr-IyUGKBSN5swZ1XppNMu5A8FwURuBgbdM6h1YB5cCyV29d3LnUfJc8CG6j1if_kYTg3FnLgQCi6bxB9n0bgpBafPwtE7SXgzMKb02O547pPMnoo_nyXS5SqbzVZJm81mySOPZ6QFZtyow7A4Q3xM6xuadSHw2j4A-CxlUsi-s1YWpKVNCCu3X6GSFEaJNX6J0IRg8NVDQFkxGTDxhwGkkeEK53kI4i84OIZ2wTdvvCgHEjCsGAOAMw7elbEWia5vGYGTVM1PgHDONlzVXfaqfQ4A2enxAATrEy0bJgnu5gwMoCFEDnGCGSIV8OTIC1iKrwL4rH0LupCMZFhwyZK4yrRKs4sjyBazB9zWXWpDzVv9nxCGi2HsA6RG5p19EZPFBRO4BGQ68q4kY0F1le_pfwPB3WUuKaoFgIL5o0DFkCGYYtqYB7V7rSwBT9p-W-p2RAhGM-Fm-LdXHJfZoe2nMv9QK3JlW8HqA_HRWLybxGLbKDS2uXbdIN-EeCKWrM08OGwY5o_fKucj-qKAvD6ILo3Qhcm-D1idZ55y_nWOHJPsYmxMzm9ZVAajJ9AIhf7pAGGL9Vw0Cy6U_pg_T-R6HIromXzeb-C77p7n8i-JxMdf7XB4meVE3PQE5phfkJdNQjJPhYbPD_hL8fK6CPJlOKuSBORy83jkjfSf2hytgIMNU_fSDysQXjRhkOHg27DNwKvbI4gYaX-Gm5ILuoovMMOQ4nwZndTmhjN6-wREXomPR6f_G-ubgnLMC7vVx6NNj07M-bd6Icg0xWw7gaZpekfOEIdLfr9G_Gy-LbjBwvA5lOvR2KEtZSJo39uO6KV9n9cNAj114cszsCocE5GRDTeA0pTTcSoe9-lH6iibGbl7sZg1OLwg4-JgwLCzU-AFqnHU0FKgEt8-h1hmLC5yCQn-gfximBvaALZ6U8lYWD93EQzMQneaPHFUwJQlC3IX5qZaupZnAed4pibOGJ_OIRRAtddc30Cgch1Ynhm2NyA3OGjhtPRDQKu-bUPDCeL6nToxF4Ny-0Ocb_Ey-_VYtRmKdiVW24iMvvYL19zMtkEp6Y0gdS0XGWx6UoOnpaTkfz6ej1qr1QCT6s80n6GH8odTu8Bg31tzjCEh_GpxrabC7nWVpshhV6xTiEsRsxcUSSrGaxYvZbJpCWUyTdFVAPlI8B-XW2CCxP47kOo3TNE6SWRKni3Q5SbJFskrKLFvEs0Wcc6zINGmpCQkm40d2HXTI261DopLOu1cid05uNcCBP299Zex6Cy98jJipuR4FlddB3z8BtdcCEQ">