<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=http://email.email.llvm.org/c/eJylVU1v4zgM_TXKhdjA8XcOPrTJTrHAzLnHQrZoW7OyZEh0uu2vH8puN5tiDgvEUGJ9UI9PFPncOvXWPI_aBALNLQD5BYFGSfAa3xgQtCUHEoK2g0EwTiqRnuKychAcXNC_gZGEInsQyVkkDyPRHOIo_cZtcKp1hvbODzx659-lvfz8YZ9_PG3mokw-2jpc2F-ZvxB8Z09lLtK6c5b5XZxWINKHmbxIjyCqx80e-Nks4s76hdjmY6Jd-h49iOwMHnkR_eyRXjoZSGSnr5tE9ic72-BF9oF-9bGdBk5umrVBjtR_gjK5C68F8uAsu-rQEnRG2gGkVTB03f6KwwwWbxmuDiRJd590Ps-90dioi-IxEcU5ElqPfLrCfD7_F-ew4XAwTtygvhswvQU8lHcjZreIaX43Yn6LmKV3Ixa3iHlyN2L5BfH-m6luEYvyJqGr82_Lbvs_O07orf7JubWso1tt4fzwxFU1tdpixJ5Q2hAFgicDBZA-CkXnPGc_7x29W4bRLbSaTFqhVXt4jpphRVpx2VnvjInYzsPsiCtGS2PeWE46cl6_r14GfUH7r5fJKTRcYWwVkCuwhrpl1YqKFKIk1fCO_9DarSCMuo-8uP4qcD7s4S_iSbcYBS1Lm0c-mwIZrlVc5hFusdLowfJaBF5Jd9KCNKx1Ik1b3vg3LDN3V2KzJNYVy4dnMZyQsWjhO2JtiTxapjobTRTD6vp-s3H9unUlziJFYB2B4ymmuVNNpo7ZUe7kQqPzjZIXnAaPaHeLN80XcdU0Lu2eL4YHxlw-X3_M3v3kSPJQh7Agk_lWZGlZ7MamLfJOYdfW-aHFvOjLSiF2mZT1MemTXu2MbNGEhjOJD2nxFVYI7nNe7e5noJs0SdPkcKgPaV4mxb6TqPJjeqy6rsuSKhF5gpPUZh9x4ndj55sVsl2GwItGc8ZdF2UI8b5wJcwMSZPBJn49PhI2xj7mD6wJZHi4Wyk1K59fOz0ffA>53265</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Load combining cost modelling
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          davemgreen
      </td>
    </tr>
</table>

<pre>
    Whilst it is true that we these into a single load, we do so very late:
https://godbolt.org/z/vbvjMnWMG
```
uint64_t Load64(const void *ptr) {
    const uint8_t* const buffer = reinterpret_cast<const uint8_t*>(ptr);

    // Compiles to a single mov/str on recent clang and gcc.
    return (static_cast<uint64_t>(buffer[0])) |
            (static_cast<uint64_t>(buffer[1]) << 8) |
            (static_cast<uint64_t>(buffer[2]) << 16) |
            (static_cast<uint64_t>(buffer[3]) << 24) |
            (static_cast<uint64_t>(buffer[4]) << 32) |
            (static_cast<uint64_t>(buffer[5]) << 40) |
            (static_cast<uint64_t>(buffer[6]) << 48) |
            (static_cast<uint64_t>(buffer[7]) << 56);
}
```

Doing that too late (in DAG combine) means the costs are incorrect throughout the midend. We don't unroll (or potentially vectorize) given the costmodel only sees 8 8bit loads, 8 zexts, 7 shifts and 7 ors. It should be treated as a single 64bit unaligned load. We can also "break up" the pattern in some situations, by splitting off some of the loads but not others.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJylVU2PmzwQ_jXOZdSImM8cOGyTtqr09rzHlYEB3Bob2UPa3V__jmG3aVY9VApygLHHzzweZp40rnuuH0dtAoHmEYD8gkCjIvgZnxgQtCUHCoK2g0EwTnVCnuJy5yA4uKB_BqMIRfogkrNIHkaiOURLfuYxuK5xhvbOD2y98O_SXL5_s4_fvmzuokhex2ouHK_Ingj-40hFJmTVOsv8Lk53IOTDTF7II4jy4-YPfG0ecWf1ROzzOtEsfY8eRHoGj7yIfvZIT60KJNLT-00i_cTBNniRvqJfY2yngZObZm2QM_VHUiZ34bVAHpzlUC1agtYoO4CyHQxtu7_iMIPFW4arAinS7Rudt3NvNDbqIv-YiPwcCa1HPl1h3q5_xTlsOJyMEw-o7gaUt4CH4m7E9BZRZncjZreIqbwbMb9FzJK7EYt3iPd_mfIWMS9uCro8_7XttvvZcUFv_U_OrW0dw2oL54cv3FVToy1G7AmVDVEgeDJQAOWjULTOc_Xz3tG7ZRjdQqvLpDu03R4eo2ZYIUtuO-udMRHbeZgdccdoZcwzy0lLzuuXNcqgL2h_R5lch4Y7jL0CcgdWUDWsWlGRQpSkCl7wF62vJYRR95EX918Jzoc9fCWedIvpoGFp88hn60CFaxcXWYRbrDJ6sLwWgVfSrbKgDGudkLLhjT9gmfl1JTYrYl2xfHgWwwkZixb-RqwtkUfDVGejiWJaXd9vPq5ft67EWaQIrCNwPMU0d12ddsf0qHakyWAdVfA18REj5gHWRBg2d4s39Tu51TQuzZ53sGHM5e3xYfbuO-eWTR3Cgkzvc57KIt-NteybAzb9IUv7pDtWspVpU6ZJkh_LMs8b3BnVoAk11xYf2-JPWCH4nSttp2uZSJkcDtVBZkWS71uFXXaUx7Jt2zQpE5ElOClt9pFH_B_Y-Xql1CxD4EWjuYKuiyqEmH9cwzG-Wmh0vu7UBafBI9rdGr1e2f8PzvgIsw">