<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/85522>85522</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Aarch64 codegen is unreasonably bad when creating vector from u16 values in memory
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          JennifierBurnett
      </td>
    </tr>
</table>

<pre>
    I have attached [bad_codegen.zip](https://github.com/llvm/llvm-project/files/14623542/bad_codegen.zip) containing a minimal C++ example program showing the issue, alongside the assembly and llvm ir output when compiled using `-std=c++20 -Ofast`.

In addition to the specific error case I have also provided examples of the same function but using u8 and u32 values for comparision. The issue only occours with specifically u16 sized values. The ir output for the u16 and u32 cases are almost identical, except that the loads and inserted vector elements are of different sizes, which is expected. The issue also does not reproduce when compiling to x86, indicating that this is a codegen issue rather than an optimisation issue.

This appears to be related to a general issue of the Aarch64 backend not being able to load into the half-precision floating point registers correctly unless explictly told to. Appending the following to the attached C++ test case:

```
vec_type load_2_bytes(const uint8_t* from)
{
 std::array<uint8_t, sizeof(vec_type)> result;
    std::copy_n(from, 2, result.begin());
    return std::bit_cast<vec_type>(result);
}
vec_type load_2_bytes(const _Float16* from)
{
    std::array<uint8_t, sizeof(vec_type)> result;
 std::array<uint8_t, 2> temp = std::bit_cast<std::array<uint8_t, 2>>(*from);
    std::copy_n(temp.begin(), 2, result.begin());
 return std::bit_cast<vec_type>(result);
}
```

It will be observed that the first version will be implemented as two byte loads and a vector element move (notably one of the byte loads will be into the correct vector register - only at offset 4 and later moved to the correct offset), and the second will implemented as a single load into a half-precision register.

Issue observed on clang 16.0.6 obtained from NetBSD pkgsrc, although the issue can also be observed on the trunk version available via compiler explorer - https://godbolt.org/z/75d8v9z43
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVk1v4zgS_TX0pRDDpizFPvjgJG2g97B72L4bJbJkcZsiBbLkxPn1C1KS4wSDnh7MAIGSSKxXH6_qFTFGc3ZEe1E-ifJlgQO3Puz_Rc6ZxlB4GoIj5kXt9XX_HVq8ECAzqpY0iPKpRn1SXtOZ3PLd9KJ8EXLbMvdRFAchj0Iez4bboV4q3wl5tPYy_3rog_8fKRby2BhLUcjjelPJotxIIY9fgeUOlHeMxhl3BoTOONOhhWchn4R8AnrDrrcEffDngB3E1r-mk9wSmBgHEvIZ0Hp3jkZTfo0xUlfbK6DTkCICE8AP3A8Mry05UL7rjSUNQ0xQolo9RNaieFGjU7mCh_80GFlUq6VYvYjVYXx-d4BaGzbeAfvsLPakTGMUUAg-gMJIMNfTRp_ivhhNek4kgm9GQ-wImsGpjFYPPEUzbHPcQyHhgnagCE3C9V2PwUTj3RJ-zLmDd_YKXik_hAivhttbPGjtFYZ1BdG8k56gJtNbNRJyiiWdm52mDCJgSPF3PjIYTY4TYKo0vSnqGbhFzpbWo47Z1rhIgZMrUuwDkKWOHI9QvgFtmoYCOc4RxQT22hrVgolAbz0pJn2fWq6e9hTBeYZAffB6UHTPYO4DD2_bKqEZp41CHpsjh2diAkeY-m0CDsgtpbzRATrwPZvORMw05BOfKP-RULDvCUNMzmqCQBZTouwB4UyOAtqZj5HbAwbVVhuoUf0kp3MGNeUGry0lw1Q3MG5qohZt89AHUplgaKwf8-i9cSn3s4lMIYLyIZDixKyzFHPhrMkv2NsU0RIOfU9OzyPSeGungRld3YZ8HjCmyJn0NNh3iYtqNf3kfy-kTnztR8ZP8lRfOZG4Vd5FhsE43p5YyAM0IQnCbgJ5fBr_gDxgB1EcMAS8iuL5ZvKcG8I3Qm5nJ8m--AaB4mBZFDMGwAeM8v315ITcjv6eQabHaLGs6WzSt4SToO4AAvEQ3AdObfik0qgXzzfvxTcht5PzO3Px-PI7tTgdE33r6hfFuE_kb9TjVxgyGTB1PYji5Q_T_TPrsQxCHuYkfs1D8vW58L_HyT9ByJdencSa4dVYmybW15HCJY3srFuNCZHhQiEP3HzOJIVOqkUaMAK_ekjc3qkcfpE36PyFQMit84xp53h3U4E705uDeeKnSZ7R5hGHh1HSkcE3TSSGzbjFMH1MvjR8QRjPTQVPZ_NyIeWdHt1-SQohrRlLdxKEXwVoDufz8hsVbi6ld6AsujOsq-VqWYGv0xInnXse_k389N8X6H-eY1DjiubWD-f2Y3GDSgKcVL7-DJtOcBjczxs_eEFjs3ReDM7rO2T18yFX7cvVxOvaW176cBby-C7k8bHU28vufVMs9L7Qu2KHC9qvH9ercleuymrR7mmnJFZl2Si101jLqiz07rGs9K4slCp3C7OXK7lZFetqXawfi2pZ1qtq3char8tqWzU7sVlRh8Yu050j-V7kPPfbspRyYbEmG_N9TEpHr_PtRabrWdjnm1M9nKPYrKyJHD9Q2LCl_bxTPpYZDC4QRu9y49Wop90YaNweU29lPtKGn64TxkFHnQ_XxRDs_i_f6XLY6VKX0_p_AAAA__99-nb8">