<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=http://email.email.llvm.org/c/eJytlEtz2jAQxz-Nfdkp4wcYfPCBvNrMNJkeMtPeGD0WrEaWiCSH0E_flW1K6OTWMsbsytq_fvsw3Mpj8wW1tklxDffAhFASTWBaH2GP-IwSWADmuo1Ba2YtSNwqo4KyxoMyIDQzO2BGgrFBCdoeWgp4fXGy06x92fi8AqVMcMp4JUB5ELbbW087t8524FnoHQsqqkg5KL1bcrY3MhrS9lxHo-t1UHutBIsMM2Im6XishYA-gAoxkyijAnjEzo9ExBpaBJQ7BME8QlKspoMo-lIV7BbuH5_yavNw_0gb1--8vbOyF5FkWlv_SIp60HboSSYGXyY1VCtmfnCWVtQWHr6u4WB7PVTttBuHSmtNOSXFsgNrqAcamTNRJx7AGZXQD8kJZsYNY9IGkFJgwbqYvbdUlo4dgSN0yvsY722HoSWL5MdS7KzklnjxjXV7Tb4QVGb0AxM1iSuDsyS7SbL1eG9D2PukXCfFHV1T-My6HXm_6Htbf3_99sS_f56Cqmy8rpPiKl7jalEqI3QvqQPl9XmwkvL2_Vk0MXn1ttoEwrSSenVeYDHFs8svXUHdmISW05FAH4dUY3M5l6QqhmGJNx7jyhPk8uZjGM7-JwvNxgXIQNfrP3RnsA_ZTvVNZVPKuqxZyvrQWtc84rMN-KwMS3unm7_6pkLb8xk1mBytX08_n2iyf6Kg1-eORqZHT8aizIssbZuiQokiq-YZClZntayLxUJyUWVLwclONeOofZMsqNGFwQMMEmQni5v03wlUU2RFkeV5ni0WVV7MVvMl4_M6r7eruZjPVwmRdUzpWdSJI5m6ZpDk_c7TQ6188OeHjN6JnUEcgIkwqKCxWTMn2mpOnHBoj9Nf2_QiePCxXfRncEXW2KbYaTt6NFJJeZcO2M3A_BvpSbjr>53120</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Aarch64: why clang combines sqadd + sqrdmulh into sqrdmlah?
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Nekotekina
      </td>
    </tr>
</table>

<pre>
    Hello, I accidentally peeked at arm_neon.h definitions in clang and noticed that vqrdmlahq_s16 iintrinsic is composed from saturating add and saturating rounding doubling multiplication. I tried to test it, and it seems that in the edge case (saturated multiplication of INT16_MIN * INT16_MIN producing INT16_MAX) the result of saturating addition is wrong if MLA would not saturate at all. I'm only learning the basics and can only test in emulator, so I may be missing something. In the godbolt example gcc does not combine.

https://godbolt.org/z/E9WvPTbWG
```C++
#include <arm_neon.h>

int16x8_t good(int16x8_t a, int16x8_t b, int16x8_t c)
{
    return vqrdmlahq_s16(c, a, b);
}

int16x8_t bad(int16x8_t a, int16x8_t b, int16x8_t c)
{
    return vqaddq_s16(c, vqrdmulhq_s16(a, b));
}
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytU0tv2zAM_jX2hVjgR5zEBx_S11ZgLXYo0N0CPZhYqyylktw0-_WjbGdphh4XKDZJkx8_PsStPDbfUGubFNdwD0wIJdEEpvUR9ogvKIEFYK7bGLRm1oLErTIqKGs8KANCM7MDZiQYG5Qg99BSwNurk51m7evG5wtQygSnjFcClAdhu7315Ll1tgPPQu9YUBFFygHpg8nZ3sgoSNtzHYWu10HttRIscpgRZ4KOaS0E9AFUiJVEGBXAI3Z-ZERcQ4uAcocgmEdIitWUiKIvUcFu4f7xKV9sHu4fyXH9Qds7K3sRmUy29c-kqAdsh55gYvBlUUO3YuUHZ8mitvDwfQ0H2-uhaydvHDqtNdWUFMsOrKEZaGTORJyYgDNqoR-KE8yMDmPRBpBKYMG6WL231JaOHYEjdMr7GO9th6ElieDHVuys5Jb44jvr9pp0IajN6AdONCSuDM6S7CbJ1uOzDWHvk3KdFHd0pvCZdTvSftP_tn5--_HEn79OQYtsPNdJcRXPaC1KZYTuJU2gvD4vVlLefsxFG5Mv3lebQDStpFmdDSyWeFb5pSpoGhPQckoJ9HNIPTaXe0moYliW-OAxrjyRXN58Toaz_8mFduOCyMCu13_ZnYl9yu3U31Q2pazLmqVBBY3NmjnRLuY0KTi0x-mKTgP14GNaWuorksZ0kbEdNWpNUt6lvdPNP9NWoe35jFBI0frt9PpC9-EXCrp0d7RoPXoSqjIvsrRtZJWLmle4XJZFJld1KbbLoqprkWNdLbI81Yyj9k1S0XoUBg8wQJCcVDepaoqsKLI8z7OqWuTFbDVfMj6v83q7mov5fJXMM-yY0rPII65h6pqBEu93nj5q5YM_f2R0D3YGcUhH-KwPrXXNI77YgC_KsHTI3gzs_wCdX6LT">