<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/91780>91780</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [SLPVectorizer] Missed optimization: missed vectorization with non-clobbering store at the end
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            llvm:SLPVectorizer,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          XChy
      </td>
    </tr>
</table>

<pre>
    Alive2 proof: it's too slow to verify the vector optimization, so I take GCC as the oracle in the link below
Godbolt link: https://godbolt.org/z/xe6Tzq8Ge

### Motivating example 

For the reduced testcase below:
```llvm
void e1000x_core_prepare_arr(uint16_t *arr, uint16_t value)
{
    uint16_t checksum = 0;
    int i;

    arr[11] = value;  // vectorized if we remove this line.

 for (i = 0; i < 0x3f; i++) {
        checksum += arr[i];
 }

    arr[0x3f] = checksum;   // vectorized if we replace it with "return checksum".
}
```

We get a long `add i16 checksum, arr[i]` chain: https://godbolt.org/z/xe6Tzq8Ge  and SLPVectorizer fails to vectorize it. 
After some trials, I found that if we remove any store outside loop, vectorization happens but codegen is still poor compared with GCC: https://godbolt.org/z/rEfbzhW77
Looks like a phase-ordering problem for cases, since the store outside loop may be merged into stores after unrolling. But even for the case with only a pure loop, the count of vectorized instructions looks bad: https://godbolt.org/z/oznjWWcKr

### Real-world motivation

This snippet of IR is derived from [qemu/hw/net/e1000x_common.c@e1000x_core_prepare_eeprom](https://github.com/qemu/qemu/blob/dafec285bdbfe415ac6823abdc510e0b92c3f094/hw/net/e1000x_common.c#L195) (after O3 pipeline).

**Let me know if you can confirm that it's an optimization opportunity, thanks.**

cc @alexey-bataev 
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVt1u47gOfhrlhmhgy3F-LnKRn5NicDo4B7OD7d4VskTHmsiiR5KTpk-_kJw2yexgMFsItc1SJD-SH1nhvd5bxCUr16zcjkQfGnLLvzbNeVSROi9XRh-RQ-eIalasQAfGZx4CEXhDJwgER3S6PkNoEI4oAzmgLuhWv4mgyTK-AU_wCYI4IDxuNiB80iUnpEHQNn0ZbQ9QoaETy7YsWz2SqsiEJI9-mxA6z4oV4zvGd_vhr2Nye8Z3b4zvXnH69e37_BGH65ffvBgOfKagjyJouwd8FW1nEG4Vd-RSFA5VL1FBQB-k8HiJqHi3N82GY8yxHURH0gowz7Ls9UWSw5fOYSccvgjnGJ_32oZ8-hKA8VWSbOBDdBSmR8YXF9uz9fACAFcd2aA8-L4FVmwhY8WNjrYB9IfkKo9uynWes3Kbbg1uijXAkLtLkfQbKtA1nCLqlo4IodE-5hvHdyZrcsD4XH-EAPF1A9lrUacvxtfpLOAOQ_y5hs_X8foQm2bl9gqFzbY_h5AcXEC8G0o4fgGkM0Ii6AAnHRpgnDsMvbPX-5y_o_tw-17U2yieEfYYQIAhuwc2zYRSoPPpjaHNLZppBrIR2v6rVgUQVsEfT___8x2Ig1po4wdWXWSgw_jSrKs6oANPLUJwWhgfo_gENfVWQWhEuC-osGfwgRwC9cFrhWCIunjn3XhiKDSi69B6qPoAkhTu0YL24IM2BjoiB5La2NRqSOvjZvM7ON1_6uqteZ7NhuCfiA6xvw4IArpGeHwgp9BFTnaOKoNt6rXIuwTMaysx0fKfIKAVZ6gQWnT7WH4baNDyIFKSeuvIGG33Y1j3AfCINlmP5hKzExKy5hyj6d01N0mDehuA6rsWsz64XsaM-ah88FAJ9TuJoDf77flZ_tf9fDZ9QWEeTuSMgvYypsjeqn6NxPRWdx2moD59ieWJqTuigtpRC6xcf8e2Z3zXnBjfWQyM7z7GUtuSHUs2yX42qBA7R21sYj7_AYkOTV-NJbWM7y7mL4_KUMX4TokaJZ-XlapqnOSlkNM5L0SlZJlnmFULLos6W0x-HRcvnvJFmQYInw_l-18Bne4wTiPGF-P7xMXzhAFahIOlU2z6M_UghQVJttauvZBh2FXC3i0koK4jF3qrw3mot7AHPx7M3jqSEtgkEwZf8fxQiSDwCCO1LNSiWIgRLvNZXhbTYpIXo2Y5q9W0mpYZTmZ5Vs2xLKVQOBc45TNR5dORXvKMT7Iyz_JZmeXFuJrwTOJknksuRcXrWJ5WaDOOyyV2z0h73-Nykc_m2ciICo1Pa5rztH6K1d3kYJwzvmGct9p7VA_3O5jH5e6W8eJD1e89m2RG--CvzoIOJv0bcG-13MLnZPB-qRcrGPz8MEoSqyzZB2moqgZyD-wVITELrRr1zix_0WkJ3fB46Bx9QxmbJiXDM75L-fg7AAD__28QxE4">