<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/59013>59013</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [LoopVectorizePass] too big delta in remainder loop
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            vectorization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          vzakhari
      </td>
    </tr>
</table>

<pre>
    This case is extracted from `Polyhedron/gas_dyn2_11` benchmarks.  Flang has recently started propagating fastmath flags into LLVM IR and that caused miscompare under `-Ofast`.  The only relevant fastmath flag is `reassoc`.

The problem is related to initialization of `RBOUND` array in `KEEL` subroutine.  It seems that the expectation of the benchmark is that the array is assigned with ascending real number with some constant stride.  The initialization loop looks like this:
```
      RBOUND(0) = -DXX
      DO N = 1, NP1
          RBOUND(N) = RBOUND(N-1) + DXX
      END DO
```

When `reassoc` is present vectorization kicks in and it introduces significant computation error for the remainder loop, so that the ascending order property does not stand and this later causes negative numbers in `DX` array:
```
          DO N = 1, NP1
              DX(N) = RBOUND(N) - RBOUND(N-1)
```

After this the benchmark goes off the rails.

C reproducer (exact numbers are taken from the benchmark just for reference):
```
#include <stdio.h>
#include <stdlib.h>

__attribute__((noinline)) void test(float *rbound, float dxx, int n) {
  rbound[0] = -dxx;
  for (int i = 1; i < n; ++i)
    rbound[i] = rbound[i - 1] + dxx;
}

#define N (750001)
int main() {
  float *rbound = malloc(N * sizeof(float));
  float dxx = 1.0477466E-06F;
  float res = 0.0F;
  test(rbound, dxx, N);
  for (int i = N - 128; i < N; ++i) {
    if (rbound[i] < rbound[i - 1]) {
      fprintf(stderr, "ERROR:\n");
      fprintf(stderr, "%d: %e\n", i - 2, rbound[i - 2]);
      fprintf(stderr, "%d: %e\n", i - 1, rbound[i - 1]);
      fprintf(stderr, "%d: %e\n", i, rbound[i]);
    }
  }
  int i = 749985;
  fprintf(stderr, "%d: %e\n", i - 2, rbound[i - 2]);
  fprintf(stderr, "%d: %e\n", i - 1, rbound[i - 1]);
  fprintf(stderr, "%d: %e\n", i, rbound[i]);
  free(rbound);
  return 0;
}
```
```
$ clang -O3 -march=native -Xclang -mreassociate repro.c
$ ./a.out
ERROR:
749983: 7.864350e-01
749984: 7.864361e-01
749985: 7.857932e-01
749983: 7.864350e-01
749984: 7.864361e-01
749985: 7.857932e-01
```

We can see that element `749985` breaks the ascending order.  The problem is that the continuation value (`7.857932e-01`) used for the first iteration of the remainder loop is computed as multiplication of `1.0477466E-06F` by `749984`.

I realize that `reassoc` practically does not give any guarantees to the user and allows any reassociation, but can we make it better?  Otherwise, the only option for such code is to disable `reassoc` (which will disable vectorization completely) or disable vectorization.

Vectorized IR is in the attached [repro.orig.ll.gz](https://github.com/llvm/llvm-project/files/10017089/repro.orig.ll.gz).  In the vector loop body the last stored vector value is carried over to the next iteration to be used for computing values of the next elements, which guarantees the ascending order of values in the array.  Can we use the same approach for the remainder loop, i.e. take the last stored element value as the initial value for the remainder loop?  There is a sample rewrite in the attached 
[repro.modified.ll.gz](https://github.com/llvm/llvm-project/files/10017122/repro.modified.ll.gz):
```
$ clang -O3 -march=native -Xclang -mreassociate repro.modified.ll
$ ./a.out
749983: 7.864350e-01
749984: 7.864361e-01
749985: 7.864372e-01
```

This modification provides much more expected result for the remainder iterations for the cost of extra shuffles and register pressure.  Maybe there is a better way to implement the same.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1WNty4jgQ_Rp4UUH5grk88DAJSdXUziRTqb3kLSXbMmgiLEqSSZiv39OyMTZhsrdsioDRpdV9-qh1RKrzw_LXjbQs41YwfIpXZ3jmRM4Ko7dsMA2-aXXYiNzochDdrrl9yg9l9BSG6GKpKLPNlptnO2bsVvFyzTbcMiMyUTp1YNZxQ7Z2Ru_4mjuJAQW3bsvdhhWKry2TpdPsy5ffv7LPD4yXOXMb7uBOZTFvK22mtztuBKvKXBjyZ3RPFvCAJX_dCKZLLGSEEnteur51CggDjeDW6oymDILVIPhUv9NkOJYqsaWBMMHJV7gjS-kkV_IHPNYl0wVZebi6_-1uRVFzY_gBg6j1l5ubL9Rmq9ToCgEKuPXZMSvE1taxOKwjXncic605amqxo8XbgY1ty-CyXJfw50UiGm6BaE7wIRjFymqbAg3fZfVWsEyXwBrxW2dkLhpozuJQWu_o7dkyJZ8F1pN2EDdoIIjm5b8y_9fEHM2DQbRgg3jFRqvHx-6I1T278x3hILpmd9_CbmfPxN3RxKllFPq26IqdWb25W8HyRcfq9z82ouynljDbGWFBPLYH1tocw36W2TPxzLNLOmKc0XmVCcsIYlnIjJAjolVNioQx2rAC_5QTI7ZcevoRghSn1Z2MtanRhsYQ14VxB5ZrrFBqygmtXHMbXhLNTM1w9AvaFnvRpNQ2tFo9tkR7P0N_Kwd-0OPlHFDL6Dwn7yD_qSDvfSB9Fq8pXF3U5DZcKtvbbdeAcVfjjm0czcUrykwbNe1wx5-RVF92-pa_V9b5ZBhRCINmQS7-BJZBFMsyU1UuEOi1dbnU480gvvlJr5Jpt9u_Pz1xh22UVk48PcFVvEotS4XNTQsDsL2WSKZAGYrmhdJgwiD6ZFKNIkUpqJvy11f6Arqx0uM-uzomphmaXAWDZFXvKxodtwMoWtimubJJbXzlH69hDI_YNHjJNleU4taqPFo9tSDHoW_FZusuNZituqEDn1wUCJQYFc1nSRAEJ0KQO7QVPCa9gM5A8ItvuVLYmuAUdWCv_RC6OAJWA9mN-IhZHe44mMxmk-n0ZhRMb98Mwz73w4Jx0O1sMnJKRJOCu7OlzsG9I3QQUwvxXR_ibqSMyYKdFmnBvn4L9puZWHpnsCyhAO6hypB3gyi6eXi4fyBGJ9cAN-q7-868QZTkmIbHRLRzQTk44B96LkW1S__ZcvjGcvgRlvtWL1hsqdp7PGVxNlks5kk3z_8LaP8PYB8LVmGE6GyEbpcRrjIlCy5UgLNCel5XJyzzCm90H7MRCnOGurkq69Nr9Nj0bZsTWeKQqyv-ODsZGENC8jF0Ut3U0t5_8_mLKdrZeD6dxEkgRkHY6Zuc-qbhWV_S9CWzRRyd9X2wzcuCBBqMlyT7amUAPbolKYJhjS2SywDn2V5SDY1g68jRVl9A2kFWVrUw2XNVCSo_ZLfr2dQrNC-aj7KlkAbnpsRx3dOdfTVDS9XKBzMh3beVcnKnIIk60vesGlMkhzayybmu_uw1Kqp9HUNfpe3odgHrSnUE0pooxMsDW1fcQIsJtDvtvUVExksnOkxerB91opikS8k1w0nt0X8ROHQgbCHyUuEQ-CC-ZewedsyLtIKGuuONQe98gASWrbINQMj9BQjr5tJypOHMc4D-spEY-SKVasf0pSYhqYQT6kDZgOmLw3po_d70AH5cgKQXgJ4gzvEM9y6GDV7vI4xaj5Uar3_4_T7fOLfz-j26pZsZ7gJVOoYH-KLU_vgxwtTvWANfC6mExWeIM30WzHEA376xHC3o_lK7UDtd0yTFTdE3KlywoGi1gWvNgJqU_hJpjES73pNErBNY4kbZISFaU3HiaU092gbeiD2S1M9q9pClvNXQd_lxQXpjcmPmCCIJaAR0XZMDy_pmy3Fl4juEDojfkflyjJsUydI3kR-3dx06r91prltN40_MEiOx1Y0HjJMrYAwGvRiA9Db5NVOOFNjqHNcVkX8cDUI6UG4vWn9PY__bs6CzxE9PhQ-p2eib_VXN9j981B415Q4u7nF9pjIIYmyR6ubmjlRAc6I2XkhrS27bdmYaVAEb_a8pzG6qogDmvo4ZsZbW-XuisLYydFf_yg-p59iRFXX1Yi_84H-OIIp4uh3JOx6KZTidzpNkMY0Xw3wZ54t4wYdOOiWWoMsXUK2tLN-QBRKpTmPzyTXLhXKcqNan5rAyavmP2SQRhKdTsgjCeLhZZjzOQx6m6WKWTtJZmsRhVsx4WCSTcCEmk6HiqVCWvISa6RVGUjfJaiiXURBFYRhOA5w74WJciEXM02kxnyc848VkMAnIczUmX1C91kOz9G6l1dqiUwFhe-psfkrxwJB9XrmNNsv9D_684UYOfQhL7_-fLpG-RA">