<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/58177>58177</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Optimized code quality for horizontal add worse after ab9a81f
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dyung
</td>
</tr>
</table>
<pre>
We have some internal tests which attempt to verify that the compiler is generating horizontal adds for various code constructs. After commit ab9a81f736acfb927b0e0b4f0de8710fc2379f70 it was noticed that one such pattern was not generating what we were expecting. Consider the following code:
```c++
#include <x86intrin.h>
__attribute__((noinline))
__m256d add_pd_001(__m256d a, __m256d b) {
__m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
return __builtin_shufflevector(r, a, -1, 1, 2, 3);
}
```
If we compile this with optimizations targeting btver2 (`clang -O2 -S -march=btver2`) using a compiler without ab9a81f736acfb927b0e0b4f0de8710fc2379f70, the compiler generates the following assembly for the function:
```asm
vperm2f128 ymm0, ymm0, ymm1, 49
vinsertf128 ymm1, ymm1, xmm1, 1
vhaddpd ymm0, ymm1, ymm0
ret
```
Note the use of a single vhadd instruction along with the ymm registers.
However, if we generate the same assembly with the same options for the function using a compiler that includes ab9a81f736acfb927b0e0b4f0de8710fc2379f70, the compiler now generates the function like this:
```asm
vextractf128 xmm0, ymm0, 1
vhaddpd xmm0, xmm1, xmm0
vextractf128 xmm1, ymm1, 1
vhaddpd xmm1, xmm1, xmm1
vinsertf128 ymm0, ymm0, xmm0, 1
vextractf128 xmm2, ymm0, 1
vblendpd xmm1, xmm1, xmm2, 1
vinsertf128 ymm0, ymm0, xmm1, 1
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydVUuTozYQ_jX4orJLCAPmwGEem0ou2UMOOboENEZZIRFJ2J799WkJ8Pgxs7sVlyxoqbv1qfvrptLNW_k3kI4fgVjdAxHKgVFcEgfWWXLqRN0R7hz0gyNOkyMY0b4R13EUOyC17gchwRBhyQEUGO6EOpBOG_FdK4eOeNNY0mpDjtwIPVo0abydss6MtbMb8tTimd5TLxzhVcF3cZsnGa_bqmB5RYFW25Y2sMtj2tYsyYs2pwR1T9wSpZ2ooZkQaYXXGBHx4CEbtWhcQzt5xRPgMEDgPEDtlzfkBRGJBoH4a7VaSn3y6h5tlDxF9DWiT1FGp1FH7NmPaZUlQtVyxGtFyct5l2EQjVCbLkq-zBph3u8RlRHV6GC_j9gOh9JCSaEgYoUfs1rP0qzxgdsPzZ7SGBUvixF7IYtQoQ2J8hkGuawbxPFK3q286_wZbdNnGqV-ZxJiFLw__86uN5J5o7q2qK4sqmuLarZAKK9RckFjwI2Ygv2-GoXEGO9tN7athCNGXBtEZ6azcVrHfg4T81PiES-evNfb6E_iH61P40xAzBoy8CRcR_TgRC--Y7Yxo8Rxc4CQ-MoheZkPi0-g5Li0_srI-i-y7rmpMVuvk4o_AQM7Wm_F3ynuvevx1ynqb3JTIzMLwd5xjFsLfSXfQpmErVHVHv4j8bjtl_hOv-MApmdtzHaT_Nb34eCrZ4jrtrizE8qCcRfDRW95nudnfGfWIS2H5sF9kG80Mf0fpu1P7SBccrRAdIsB9nHGDAbX2ICmvoC3J1xqX7A-qd4Az0CvB2GxtO3murJ-1yekVeCTCKxYIh3sLMfGdonxxV1Y9mTxNLmP_GP2Q4OZ69z-fw4ofbrnwXKkFN8mHv9K2uHsDK8v-Tvfpf2TtC1q5_cs0586vuHF547jO8f3mo-Eu0F8_gT5Ix72o4tWEpQHNP8-wMU-Mvs5uI9u_8jxFZRxlqU03iVZumrKpCmSgq-ccBLKr1Nnwq9V-Ab-O3Ip3FT1tx9MctIGq4OHD-NMtdVoZNk5NwR6sN9wHJDKY7VBbqEg5XF5rAej_8Eui6KwdgSLL-kuzvNVV_IdrVNaxHnBsibJc6A1bOuM06TI02zHVpJXIG2JPR1b-kqUjDIWU5rRGP_pBpmd8RQtOU_rbVxEWwo9F3LjD95oc1iZMmCoxoPFTYnlat83sQzFQQEs_vmIPdWUzduoDqsAtgxI_wNOCpYA">