<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=http://email.email.llvm.org/c/eJy9Vllv4zYQ_jXyCxFBonxID36I4xQJEPRlW3TfDEoaW9ylSZWHj_76DinJ8REnWBSoYVA8vm9mODMcslT1cf4oBCmdJbYBslUGO5rvOBPEGTBErUk0TXZbJ8Sq4Ztm5XIcEwOwJVaRVqvaVVxuiJMSKjCGaS6OpBSKWahJpWogTNYnwpb99BNn8COC2qNX5C3wOkjLtPUTjEhQkmjYcGNBR9ljlCyjZGinSf_vhjTjshIOFUaUMr1deXLc4OAEqGHNJZDn3x8Xb8-r12_f_nwmabfKpfBLrRLHdHrIV5Z0mxZqv2rziOZ-JT-kU1xhEX0iZ-MyogWJZotOEsGfBuu07EUE-m4D9l0YSii8kMvZMswWUdZLimbL8w0Pth0o6mwRXymJ8QqG4ExEcROts14ul5YIkFd2Hq7G32_sZs4H1WoSZUsSpKHYRRCVXcPeAmhXu_bvlVy16RQtQhseHpDfb-8lQN7eN4QxWl-4_13mvuHChy7v1D-TfjPFiQqyRvY748Lyk1mmM8v9g2Z1nteA7gDdYlD8VG_q2xCCj5dfrmJxoYN1Oi4TxMQ7JqLJIokmSy_4cJdentHDofpF_uNn6tOe_v0uffGp-q_5T18E_prUJQqyJs_ec9g-3mBezjBlaBcXmNNRODtdGH8M4tOLX7s9MlfVIbR_NSBDmeHGON8SkKwUUPsN-_m1EuhQX89C5ULABiRoX8s-rz7DPuI4vpyAgw3fXRan0zIk3PS2h9md3-HNTmh627vltT6snjeN86ZXknesxHc-BtMBTAdweh-cDeBsAI_vg2cDeDaAJzfgk9M-DNorFjlgIUQKK_ReOVGje1qorL9RGraDr0LS9_5XT9ELV02_cBW98BW99dWHrvnhsP4Ljlcq7_Ja4JWKqbttBRxIxQy6i9tG9de7UKrtiH80mNrdIdgzgweqAmnx3sZrWhImjAo8f_M6wTTBS902W7C86uY_ehJ4bQYDorzh_kHRoODSy9NglNjhawCNrATDw2W1kz_jPh6jep7VRVawEVaXRum5w2scmNmOnBbzxtrW-LNHf8P_BrW7MsYd4kCI3fB5wIfID0wIHIZdGexMaD7NR82cFrOEFuW4GMMsK_Isq2drxpJ1NqnzIp2kI8FKEGaOtQ_fChL2nWP8u2GyHP13C_icJjRNKR2nYzqhaQxFWhZpUa1ZndcsSaJxAlvGRezlxEpvRnoeRJZuY3BR4OvHvC8yY_hGAgSD0ULLrYD5q4T1mlcc49jVrlPhwqKmL6t9CESfJD4nzCiYPA_2_gsbzN8Y>52868</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Inefficient code generated for vmull_high_p8 in complex loops
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          uncleasm
      </td>
    </tr>
</table>

<pre>
    All but the most trivial uses of `vmull_high_u8` seem to producing unnecessarily bloated code and seem to make an unnecessary copy of the high part of a neon register:

```
#include "arm_neon.h"
#define ENABLE_ISSUE 1
inline poly16x8_t vmull_low_p8(poly8x16_t a, poly8x16_t b) {
    return vmull_p8(vget_low_p8(a), vget_low_p8(b));
}

poly16x8x2_t p(const poly8_t *input, int len, poly8x16_t x, poly8x16_t X) {
    auto ptr = input + len;
    auto L = vdupq_n_p16(*--ptr), H = L;
#if ENABLE_ISSUE
    while (ptr > input)
#endif 
    {
        auto s = vuzpq_p8(vreinterpretq_p8_p16(L), vreinterpretq_p8_p16(H));
        auto a = vmull_low_p8(s.val[0], x);
        auto b = vmull_high_p8(s.val[0], x);
        auto A = vmull_low_p8(s.val[1], X);
        auto B = vmull_high_p8(s.val[1], X);
        auto C = vdupq_n_p16(*--ptr);
        L = C ^ a ^ A;
        H = C ^ b ^ B;
    }
    return {L,H};
}
```

When the issue is enabled, the following code is generated:

```
        ...
        ext     v3.16b, v6.16b, v6.16b, #8
        ext     v7.16b, v2.16b, v2.16b, #8
        pmull   v6.8h, v6.8b, v0.8b
        pmull   v2.8h, v2.8b, v1.8b
        pmull   v3.8h, v3.8b, v4.8b
        pmull   v7.8h, v7.8b, v5.8b
        ...
```

Instead, one would expect to have
```
        ...       
        pmull   v6.8h, v6.8b, v0.8b
        pmull   v2.8h, v2.8b, v1.8b
        pmull2   v3.8h, v6.8b, v4.8b
        pmull2   v7.8h, v2.8b, v5.8b
```

just like in the less complex case without the loop
This issue was recently seen also with regular arithmetic with `vmull_high_u8` cases too, but has been resolved in clang trunk. 

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy9VkuP4jgQ_jXhYnWUmCaEAwdoGHVLrb3MrnZuyEkK4hljZ_zgsb9-y07Cs-nWXhYhx4_vqypXucouVHWczoQghbPE1kC2ymBH8x1ngjgDhqg1ibJkt3VCrGq-qVcuxzExAFtiFWm0qlzJ5YY4KaEEY5jm4kgKoZiFipSqAsJkdSJs2S8_cQE_Iqg5ekXeAq-DNExbP8GIBCWJhg03FnQ0nEXJIkr6Nku6fzukQy5L4VBhRCnT25UnxzUOToAK1lwCWf4xm78vV2_fv_-1JGm7yqXwS40SxzQ75CtL2k0LtV81eURzv5If0gxXWERfyMW4iOiERON5K4ngT4N1WnYiAn23AXsWhhImXsj1bBFmJ9GwkxSNF5cb7m07UNTZIL5UEuMVDMGZiOImGme9XC4tESBv7DzcjH_c2c2cD6rVJBouSJCGYudB1PAW9h5Au8o1v1dy1aQZWoQ2PD0hv9vea4C8nzeEMVpfuf8sc19z4UOXt-qXpNvM5EQFWSH7zLiy_GSWac1y_6BZrec1oDtANxgUP9WZ-t6H4OPl15tYXOlgrY7rA2LiHRPRaJ5Eo4UXfHhILy7oIan-I3_2mfq0o_94SJ9_qv5r_ssXgb8ltQcFWaOl9xy2szvM6wWmCO38CnNKhYvswvhjEF9e_dp9ytxUh9D-XYMMZYYb43xLQLJCQOU37OfXSqBDfT0LlQsBG5CgfS37vPr0-4jj-HoCDjZ8d8M4zYpw4LL7Hp7u_AFvfELT-949r_Fh9bwszutOSd6yEt_5GEx7MO3B6WPwsAcPe_DzY_C4B4978OgOfHLah0F7wyIHLIRIYYXeKycqdE8DpfU3Ss128FVIut7_6il65arsC1fRK1_Re1996JqfDuu_4Hil8vZcC7xS8ehuGwEHUjKD7uK2Vt31LpRqWuKfNR7tNgn2zGBClSAt3tt4TUvChFGB529eJ5gmeKnbeguWl-38R08Cr81gQJQ33D8oahRceHkajBI7fA2gkaVgmFxWO_kr7uIxqKbDajKcsIHlVsD0TcJ6zUuOFrVZeEpBTE99XbeCyG67fndm4LSY1tY2xucr_Yb_DVrsihhhOBBi13-e8PHyEw8RDoMnDHZGNM_yQT1NknE2KdGqnI6zLB0VOeTlmJbjcVFkbAQDwQoQZor1Et8XEvatM_1bY7QY8ClNaJpS-pw-0xFNY5ikxSSdlGtW5RVLkug5gS3jIvZ2xEpvBnoaTCrcxuCiwPeOOS8yY_hGAgR1KB-rcK301OFzB5jZDoLuabD9XzMIyf4">