<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/103481>103481</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] On Neoverse V2, transform ld4 into ld2 + uzp* sequences
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
minglotus-6
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/K17nh31oG shows that `ld4` instruction could be transformed into `ld2 + uzp*` sequences and give equivalent program output (at least on little-endian systems)
According to Neoverse V2 SOG (https://developer.arm.com/documentation/109898/latest/) section 4.16, _The bandwidth of the following ASIMD and SVE instructions is limited by decode constraints and it is advisable to avoid them when high performing code is desired._
Would it make sense to do this transformation in the compiler (say in the instruction selection phase) for Neoverse V2? Microbenchmark gives measurable throughput increase. On the other hand, it may increase instruction cache pressure.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxclE2P2zYQhn8NfRmsIVGyPg46ONk6KIo0hy3SY0CRY5FdilQ4Iy-cX19Q3u16e5GAwXzofWZeKSI3BcRBHD6Jw-NOrWxjGmYXJh95pYdmN0ZzHSzzQqI6CnkS8jRFM0bP-5gmIU-_hDz9UbbBVmX8AmTjCwFbxSCawptaNAW4QJxWzS4G0HH1BkYETirQOaYZDbjA8ZYvQchPsP5ahDzmUsKfKwaNBCoYmNwFAX-u7qI8BoYlxSmpGeLKy8ogZKcYPCpiiAG8Y_b4gME4FYCuxDiTkL0oHkVxvD2PWsdkXJiAI_yJ8YKJEL5LePr2Jff7qNzgBX1cMO1Vmvc6zjkW9TpjYJXlCXkqi77rOyFPXjESb4U9EN7k1_uyEfIz_PjLIowqmBdn2EI8A1uEc_Q-vuSvOT79_vVx0_z0_bd7gASOwLvZMRoYr2BQR4OgY05RLvCNlOOcp8zFkRo9ZnXqEp3JY2Z4sRjAusnCginvII_c-jgCg-QSmv2Pe05_b2tzDLN6RiAMtPU0Edg6el_mRgFc2OToOC_OY8ogSV3fwvfnQOhfySxWEWZU55juNyGqE3x1OsURg7azSs_bGRDMqGhNN3U2xXWy-QZc0AkV4R6-3aZFtpjAqmAy903B9b-sj6eptEVYEhKtCfc7M1Smr3q1w6FsZXXo2rbrd3YwsunaXsuqafV41uo8mkaX5Rk7UxhVtDs3yELWRVdWUsqq6Pd12x7QFKpqRjOiQVEXOCvn995f5uyjnSNacSiLqu7KnVcjetpMKeWo9DMGI6rj8Zi0bWohZfZqGnLxw7hOJOrCO2J6b8eO_ebqt5rDY-Zxj1V-ft8aeFPfTPjRge_2263J__834Niu46sL8uTX18OS4j-o8-VvqmgzxSbsMsh_AwAA__8ogn_s">