<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/65262>65262</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[vectorization] Improve the vectorization capability for sha_compress
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
vfdff
</td>
</tr>
</table>
<pre>
* test: https://godbolt.org/z/7c9hTz97d
```
void sha_compress(sha512_state *md, const unsigned char *buf)
{
u64 S[8], W[80], t0, t1;
for(int i = 0; i < 8; i++)
S[i] = md->state[i];
for(int i = 0; i < 16; i++)
W[i] = load64(buf + (8*i));
for(int i = 16; i < 80; i++)
W[i] = Gamma1(W[i - 2]) + W[i - 7] + Gamma0(W[i - 15]) + W[i - 16];
for(int i = 0; i < 80; i += 8)
{
{ t0 = S[7] + Sigma1(S[4]) + Ch(S[4], S[5], S[6]) + K[i+0] + W[i+0]; t1 = Sigma0(S[0]) + Maj(S[0], S[1], S[2]); S[3] += t0; S[7] = t0 + t1; };
{ t0 = S[6] + Sigma1(S[3]) + Ch(S[3], S[4], S[5]) + K[i+1] + W[i+1]; t1 = Sigma0(S[7]) + Maj(S[7], S[0], S[1]); S[2] += t0; S[6] = t0 + t1; };
{ t0 = S[5] + Sigma1(S[2]) + Ch(S[2], S[3], S[4]) + K[i+2] + W[i+2]; t1 = Sigma0(S[6]) + Maj(S[6], S[7], S[0]); S[1] += t0; S[5] = t0 + t1; };
{ t0 = S[4] + Sigma1(S[1]) + Ch(S[1], S[2], S[3]) + K[i+3] + W[i+3]; t1 = Sigma0(S[5]) + Maj(S[5], S[6], S[7]); S[0] += t0; S[4] = t0 + t1; };
{ t0 = S[3] + Sigma1(S[0]) + Ch(S[0], S[1], S[2]) + K[i+4] + W[i+4]; t1 = Sigma0(S[4]) + Maj(S[4], S[5], S[6]); S[7] += t0; S[3] = t0 + t1; };
{ t0 = S[2] + Sigma1(S[7]) + Ch(S[7], S[0], S[1]) + K[i+5] + W[i+5]; t1 = Sigma0(S[3]) + Maj(S[3], S[4], S[5]); S[6] += t0; S[2] = t0 + t1; };
{ t0 = S[1] + Sigma1(S[6]) + Ch(S[6], S[7], S[0]) + K[i+6] + W[i+6]; t1 = Sigma0(S[2]) + Maj(S[2], S[3], S[4]); S[5] += t0; S[1] = t0 + t1; };
{ t0 = S[0] + Sigma1(S[5]) + Ch(S[5], S[6], S[7]) + K[i+7] + W[i+7]; t1 = Sigma0(S[1]) + Maj(S[1], S[2], S[3]); S[4] += t0; S[0] = t0 + t1; };
}
for(int i = 0; i < 8; i++)
md->state[i] = md->state[i] + S[i];
}
```
* this case is part from the 557.xz_r of spec2017, we can see the gcc vectorize the loop body, while llvm doesn't.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyc11-PqygUAPBPQ1_INHAQ_zz4UGe2m81mn-4m93GCipUbLY3Q2b3z6TegnVHLbWc7aTpIgXP4qQSEMepwlDJHvED8ZSPOttVD_tbUTbMpdf0zR7DDVhqL2A631p4MYjsEewT7g65L3dmtHg4I9u8I9kmVtX-_Z0mNyAsiOxST6eMv37SqsWnFa6X70yCNQZCaVnAKr8YKKzGCXV8jeMaVPhqLz0efW42rVgzux_LcIMimsZNiLGCM8TmO8DfEixTxF9f_uyuT6cIS_00Rm3p89mv0gCBVR4sVRuwFE8QKX3zGqS8iKPwn--zj_lwshfiL79TXT4j95mcwVf-_SDS-Fer7PFSnRR1HCNLy3GAEBUaQpgh2ynWD7Atxp2DjFMmXA_8u-l5QBKmvxU8YPG7mk7jUJb41FGNrMmtNeaA5jQNUt-7IpQyF-yVdZLx4HKYKbIkfw92tj9y-qcM4E1cbzdJ6bheVz74bn5XjWeM_PQ4U5DLs91mFS9PSMbSLRqaByWyAv8SPRe0Ygs7Kk7AbzF2yKZIb1ZJLbXK5QW6qUIxPOUbJp-svPOKgBwt5sFlS1zZLD7r2oDc8kqBHMgtxbfPhAUGP-FEPHvSAkAfMkrq2WXrA2gNueMRBj3gW4trmw4MGPfijHlHQg4Y8rh_auc3Sg6092A0PHvS4fiXnNh8eJOgRPerBgh4k5HHnhV54RGuP6IZHFPS4s1wtV4q1B3vUA4IeScjjzgu98OBrD37DgwU97ixXy5Vi7QGPetCgRxzyuPNCLzzitUd8wwOCHneWq-VKsfagj3qQoAcPedx5oRceydojueFBgx53lqvlSrH2IF_ycNerfdjDW83A9vIXu86R-2oL-pHLcjM-fbvNfasMroSRWBl8EoPFzaB7bFuJOU-2_76_Dlg32JxkBYQmzusfiStxxEZK3-xQVfhNVlYP6n2s6bQ-YXd88K1b1UncdW89rrU0RwSJ3W7qnNUZy8RG5jTOWMpilsCmzesEmiZNZVWTiLE6IynNmpIwqHiZRRnfqBwIMJKRiEYkJvGWScgqUhHCOBG0Iigisheq27qQ7nSyUcacZR5ziGHTiVJ25nLeGXLX6Kk8HwyKSKeMNZ_drLKdPxldJies0kdH_Ud_GvTbONfFj7gSJ1GqTtmf7pYvjjub89DlqyOUsu253Fa6R7B3cad_T6dB_5CVRbD3uRsEe5_-fwEAAP___4o_WQ">