<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58177>58177</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Optimized code quality for horizontal add worse after ab9a81f
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dyung
      </td>
    </tr>
</table>

<pre>
    We have some internal tests which attempt to verify that the compiler is generating horizontal adds for various code constructs. After commit ab9a81f736acfb927b0e0b4f0de8710fc2379f70 it was noticed that one such pattern was not generating what we were expecting. Consider the following code:
```c++
#include <x86intrin.h>

__attribute__((noinline))
__m256d add_pd_001(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, -1, 1, 2, 3);
}
```
If we compile this with optimizations targeting btver2 (`clang -O2 -S -march=btver2`) using a compiler without ab9a81f736acfb927b0e0b4f0de8710fc2379f70, the compiler generates the following assembly for the function:
```asm
        vperm2f128      ymm0, ymm0, ymm1, 49
        vinsertf128     ymm1, ymm1, xmm1, 1
        vhaddpd ymm0, ymm1, ymm0
        ret
```
Note the use of a single vhadd instruction along with the ymm registers.

However, if we generate the same assembly with the same options for the function using a compiler that includes ab9a81f736acfb927b0e0b4f0de8710fc2379f70, the compiler now generates the function like this:
```asm
        vextractf128    xmm0, ymm0, 1
        vhaddpd xmm0, xmm1, xmm0
        vextractf128    xmm1, ymm1, 1
        vhaddpd xmm1, xmm1, xmm1
        vinsertf128     ymm0, ymm0, xmm0, 1
        vextractf128    xmm2, ymm0, 1
        vblendpd        xmm1, xmm1, xmm2, 1
        vinsertf128     ymm0, ymm0, xmm1, 1
        ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydVUuTozYQ_jX4orJLCAPmwGEem0ou2UMOOboENEZZIRFJ2J799WkJ8Pgxs7sVlyxoqbv1qfvrptLNW_k3kI4fgVjdAxHKgVFcEgfWWXLqRN0R7hz0gyNOkyMY0b4R13EUOyC17gchwRBhyQEUGO6EOpBOG_FdK4eOeNNY0mpDjtwIPVo0abydss6MtbMb8tTimd5TLxzhVcF3cZsnGa_bqmB5RYFW25Y2sMtj2tYsyYs2pwR1T9wSpZ2ooZkQaYXXGBHx4CEbtWhcQzt5xRPgMEDgPEDtlzfkBRGJBoH4a7VaSn3y6h5tlDxF9DWiT1FGp1FH7NmPaZUlQtVyxGtFyct5l2EQjVCbLkq-zBph3u8RlRHV6GC_j9gOh9JCSaEgYoUfs1rP0qzxgdsPzZ7SGBUvixF7IYtQoQ2J8hkGuawbxPFK3q286_wZbdNnGqV-ZxJiFLw__86uN5J5o7q2qK4sqmuLarZAKK9RckFjwI2Ygv2-GoXEGO9tN7athCNGXBtEZ6azcVrHfg4T81PiES-evNfb6E_iH61P40xAzBoy8CRcR_TgRC--Y7Yxo8Rxc4CQ-MoheZkPi0-g5Li0_srI-i-y7rmpMVuvk4o_AQM7Wm_F3ynuvevx1ynqb3JTIzMLwd5xjFsLfSXfQpmErVHVHv4j8bjtl_hOv-MApmdtzHaT_Nb34eCrZ4jrtrizE8qCcRfDRW95nudnfGfWIS2H5sF9kG80Mf0fpu1P7SBccrRAdIsB9nHGDAbX2ICmvoC3J1xqX7A-qd4Az0CvB2GxtO3murJ-1yekVeCTCKxYIh3sLMfGdonxxV1Y9mTxNLmP_GP2Q4OZ69z-fw4ofbrnwXKkFN8mHv9K2uHsDK8v-Tvfpf2TtC1q5_cs0586vuHF547jO8f3mo-Eu0F8_gT5Ix72o4tWEpQHNP8-wMU-Mvs5uI9u_8jxFZRxlqU03iVZumrKpCmSgq-ccBLKr1Nnwq9V-Ab-O3Ip3FT1tx9MctIGq4OHD-NMtdVoZNk5NwR6sN9wHJDKY7VBbqEg5XF5rAej_8Eui6KwdgSLL-kuzvNVV_IdrVNaxHnBsibJc6A1bOuM06TI02zHVpJXIG2JPR1b-kqUjDIWU5rRGP_pBpmd8RQtOU_rbVxEWwo9F3LjD95oc1iZMmCoxoPFTYnlat83sQzFQQEs_vmIPdWUzduoDqsAtgxI_wNOCpYA">