<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/58323>58323</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal lowering of `vget_high_XXX` on AArch64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Maratyszcza
</td>
</tr>
</table>
<pre>
Clang/LLVM lowers `a = vget_high_XXX(b)` NEON intrinsics to `EXT Va.16B, Vb.16B, Vb.16B, #8` instead of the `DUP Va.1D, Vb.D[1]` suggested by [ARM NEON intrinsics guide](https://developer.arm.com/architectures/instruction-sets/intrinsics/). This lowering has two drawbacks:
- On AArch64 cores which split NEON instructions into 64-bit microoperations, `EXT Va.16B, Vb.16B, Vb.16B, #8` generates two 64-bit microoperations instead of one.
- On all AArch64 cores `EXT Va.16B, Vb.16B, Vb.16B, #8` leaves the upper part of the destination register initialized (but unused), thereby preventing its power-gating.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVUstuo0AQ_Bq4tIxgMAQOHJw4e8pjtZuNfIsGaA-zO2bQPGzFX789OM5js5dISED3VFdNV7W6f26uFB9FxL7d3DzegtIHNBaiMuUQ5WvYC3RPgxTD02aziVjVRqymJtxd39-BHJ2Ro5WdBacD5nrzAI88ycrLiF3BY_v5K2J5FfAEc8h70FtwAwbs-tf3Gbt-Aayj4jKLinU4bb0QSIAe2meg-urH7ScFwssew3lWDc5NNspXdCl6etyj0hOahJtd0ukd1bjpBumwc96gpf8gx_jOST0uLLpT6Tx6HlMn8DBIe1qQHAUMnG590NAbfmh592cmTNdRulrA_QirFVGUS-g0McBhkN0AdlLSnYW_EtpwCw3lctFSdyc7o4NaPvfmnX1lsQLHgMWTuP8Pfb99PWLyTjZX6h_pX2JXyPeBmiz1E_HBxI07m9yThXKcJYBBIUmDISnSSa7kkcwN-fIO_Ogt9iFoNJmABsn1yZCNowubl87CFGxYCB4KSYxNVpZFUVfLIov7Ju_rvOaxk05h89O3enJyx9WbdySI1H7MNqnXr7bF3qjmY4yEdINvX_Kj1P78WkxG_6YkhchY6-c4FVXO8nho6uKCp9hvt1iytOesa8sLrC54UfakNKtjxVtUtqFIR4yNeIB5BH1TkGPZsJSxLM1YVmXLLEuQFXmbbesKqyXf1mm0THHHpUqCjkQbEZtmltR6YampaMX2rcmtlWJEnOloPvdu0Ka55RSLZ3vsjjye6ZtZ_l_gAk73">