<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/107345>107345</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] gather struct load should be reused similar to normal struct load
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
vfdff
</td>
</tr>
</table>
<pre>
* normal struct load case: https://godbolt.org/z/fGYKoM8W3
```
for (int i = 0; i < eulers_per_block; i ++) {
#pragma clang loop vectorize(enable)
#pragma GCC ivdep
for (int tid=0; tid<block_sz; tid++){
int index = tid;
s_ref_real[i][tid] = mdlComplex[index].real();
s_ref_imag[i][tid] = mdlComplex[index].imag();
}
}
```
* related assemble code generated by llvm: It works **fine** as we load the both **real** and **image** part one
```
.LBB0_2:
ld2w { z0.s, z1.s }, p0/z, [x22]
add x22, x22, x13
st1w { z0.s }, p0, [x10, x21, lsl #2] # r0, r1, ... r7 (assme VScale=2)
st1w { z1.s }, p0, [x11, x21, lsl #2] #i0, i1, .... i7
add x21, x21, x12
cmp x21, #256
b.ne .LBB0_2
```
* gather struct load: https://godbolt.org/z/b5GoT4qqv
```
for (int i = 0; i < eulers_per_block; i ++) {
#pragma clang loop vectorize(enable)
#pragma GCC ivdep
for (int tid=0; tid<block_sz; tid++){
int index = indexarr[tid];
s_ref_real[i][tid] = mdlComplex[index].real();
s_ref_imag[i][tid] = mdlComplex[index].imag();
}
}
```
* related assemble code generated by llvm: It **double load** the real and image parts
```
.LBB0_2:
add x22, x9, x21, lsl #2
ld1sw { z0.d }, p0/z, [x9, x21, lsl #2] #index.0, index.1, .... index.7 (assme VScale=2)
ld1sw { z2.d }, p0/z, [x22, #1, mul vl] #index.8, index.9, .... index.15
add x22, x10, #4
lsl z0.d, z0.d, #3
lsl z2.d, z2.d, #3
ld1w { z1.d }, p0/z, [x10, z0.d] # r0, i0, r1, ... i3
ld1w { z3.d }, p0/z, [x10, z2.d] # r4, i4, r5, ... i7
uzp1 z1.s, z1.s, z3.s # r0, r1, r2, ... r7
st1w { z1.s }, p1, [x11, x21, lsl #2]
ld1w { z0.d }, p0/z, [x22, z0.d] # i0, r1, i1, ... i4 , can reused with the above ld1w ?
ld1w { z1.d }, p0/z, [x22, z2.d] # i4, r5, i5, ... i8
uzp1 z0.s, z0.s, z1.s # i0, i1, i2, ... i7
st1w { z0.s }, p1, [x12, x21, lsl #2]
add x21, x21, x13
cmp x21, #256
b.ne .LBB0_2
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzkV0-v4jYQ_zTmMtooGccPOHDgj1hVbU-tWvWEnNiAu07M2g685dNXdgIJLLx9W7VSpVoIO_H8n_FvHO6c2tVSzghbELYa8cbvjZ0dt2K7HRVGfJkRnENtbMU1OG-b0oM2XEDJnSR0DnvvD47QOcE1wfXOiMJonxi7I7g-E1xvP_7xo_l58jsl6Yqkc_KSdr_4CABbY4HgRNUeFBC6gpTQRVwuQTZaWrc5SLsptCk_dTu4iL8pkPHiKgcI0oPlu4pDqXm9A23MAY6y9MaqsyQ4kTUvtCQ47Xh6ho_LJaijkIde2tAurwShq2hXXC6jMRt3vry5GHRjD0D0qRbyNfoVWW_23cbK7cZKrglbKMJWhC0CFVtFhkropakOWr6G7SCHsFUSyXEStF2ltYJUxXffISiS3wkKYRyvrg_X9V3aQk1YqbmXArhzsiq0hNIICTtZSxvfF19A62MVauQHDydjPzkgGFi3qpbtCriDk2wLyu8lFMbvO6LWzZaoFt3LYPKF9cCtB1PLhwYmPy0W6QZDXQ7iDaAFnlonF3BOE0dwCecscdFTXMIhbct2CYQtXhFDBC_jVhIXIs6BCJfXKaOXlPjsRtNQQys9S1vGLEza6VCPnUKCFGzctnE3SRKw41CO3LlKwm-_lFxLQlfYV3M3bvRmD_Vmb-lVkU5d1Cagxs8cH8p5zfCWrKwOQ7Kgg710JEVSy7B3ydLDErsW2o77vbRD8HkP7hTso_k1__z5-L_Hnbji1l4x4d9CoYG0_zIUtfghTBMoYzm1gBIAKLgV8SYiTcQY910Acw8L00dnreXQInOnHiHEYwx6KKBDpXBcQwiT9szG5eDgxuf3ocaNLfjMltYngjQqqRoNRx0Te7Fj0tsxvbMjY28HqkVDgjS_A9rgdBghRBGuu5kgpU8osaPEp5TiBiSfuNuaFPUN2sAAnNUdRKu31NBvqMFHavKoJv5bdlVzh8jN-ZBFv7O-ocWZJg6ejvseY7HvNO9sKdm3Wkpf6cNe-GZ1PQz3MNCqD3cedpdQ8hqsbJwUcFJ-Hw8yL8xRtooJXf-d7HfmdGmJVgzyoPpsTJ5k43K9uLlmXJ1p3VD4LKlP7w990PHNoH91zm7bNf2n2vVIzKiY0ikfyVk2RsZSxnIc7WdjVpRc5i9UFMV0mk0ZE2ybl9uS4YRPinykZphink5TltKU5ZjgeDrevjCBW1nSPBckT2XFlU4CeofePlLONXKWpWOas5HmhdQufrkg1vIEcZfEW9vIzgLTh6LZOZKnWjnvejFeeR0_eeZzW-5f8lhwX98zwO1NowUU8lJfTlVKcwvePPgmGjVWz-7uJcrvmyIpTUVwHXtQO304WPOnLD3BdTTaEVx3Xh1n-FcAAAD__4mGi5Q">