<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54427>54427</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Loop Vectorize regression in clang-14 (llvmorg-14.0.0-rc4) on Arm
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jzern
      </td>
    </tr>
</table>

<pre>
    I initially found the bug was fixed/hidden in llvmorg-15-init-651-gb4c6d1bb3791 [1]. I bisected to a revert of a revert [2] and verified the failure with the original commit [3] and the expected behavior with the revert [4].

In discussion with the author of [1] (@david-arm, david.green@ if that's incorrect), he mentioned https://reviews.llvm.org/D115953 may have independently affected the issue. With [1] reverted and llvmorg-15-init-3042-gb3e8ace19830 [5] present the issue does not reproduce, but the revert + llvmorg-15-init-3042-gb3e8ace19830^ will fail.

Unfortunately I don't have a minimized repro, but observed the behavior in C++ code in libgav1. In the Zone2 predictor [6] the width loop appears to be unrolled and left_base_y initialized improperly. Another engineer who looked at the assembly more closely described it this way:
> It appears that the left_base_y (based on left_y) is calculated incorrectly. The compiler is unrolling the "width loop", processing two dsts per iteration. The compiler creates two left_y variables (we'll name left_y0 and left_y1). During the first "width loop" iteration (assuming y = 0), left_y0 should be -ystep ((0 << 6) - ystep) and left_y1 should be ((0 << 6) - (ystep * 2)),  but the compiler sets left_y0 to -ystep and left_y1 to 0.

The following worked for me on Debian testing:

```
$ git clone https://chromium.googlesource.com/codecs/libgav1/
$ git -C libgav1 clone --depth=1 \
  https://github.com/abseil/abseil-cpp.git third_party/abseil-cpp
$ git -C libgav1 clone --depth=1 \
  https://github.com/google/googletest.git third_party/googletest
$ LD=aarch64-linux-gnu-g++ \
  cmake -G Ninja -S libgav1 -B libgav1-build \
    -DCMAKE_TOOLCHAIN_FILE=libgav1/cmake/toolchains/aarch64-linux-gnu.cmake \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DCMAKE_CXX_FLAGS='-target aarch64-pc-linux-gnu' \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_C_FLAGS='-target aarch64-pc-linux-gnu' \
    -DLIBGAV1_MAX_BITDEPTH=12
$ ninja -C libgav1-build intrapred_directional_test
$ qemu-aarch64 -L /usr/aarch64-linux-gnu \
  libgav1-build/intrapred_directional_test
```

With this you should see failures in the 12-bit tests (the 10-bit passes only because the C code is replaced with NEON by default).

```
[  FAILED  ] C/DirectionalIntraPredTest12bpp.FixedInput/0, where GetParam() = kTransformSize4x4
[  FAILED  ] C/DirectionalIntraPredTest12bpp.FixedInput/1, where GetParam() = kTransformSize4x8
...
```

[1] llvmorg-15-init-651-gb4c6d1bb3791 [LoopVectorizer] Don't perform interleaving of predicated scalar loops
[2] llvmorg-14-init-16029-g42b34facfdfe Recommit "[LV] Inline CreateSplatIV call for scalar VFs."
[3] llvmorg-14-init-15776-g7ce48be0fd83 [LV] Inline CreateSplatIV call for scalar VFs (NFC).
[4] llvmorg-14-init-15898-g073c27b5e585 Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)."
[5] llvmorg-15-init-3042-gb3e8ace19830 Recommit "[VPlan] Introduce recipe to build scalar steps."
[6] https://chromium.googlesource.com/codecs/libgav1/+/ae48a01600fe986882bd90a410fde39915172f52/src/dsp/intrapred_directional.cc#98

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytV9ty27YW_Rr5BUMOrxL1oAdZslJNHcfTuG7mvGhAEJSQ8FaAtKJ-fdcGqZvj5DTndEYjUQSw19r3jbTODrM1U5VqFS-KA8vrrspYu5Ms7bZszw3L1VeZjYLVTmWZrLCVFcVLWeut48cOHXTGse9s00iMMz9Nw8nUZ6P41h_FS5etWaqMFK2EzJpxpuWL1C2r8_Mz9gbYyzhw8ULlSvYEcq6KTku2V-3Ovqi12qqKF0zUZansyfB4ktbl16aHSuWOv6han4-esSLiNfKWI2_ef68rlikjOmNUXZ1P8K7dQQKYDsqwUZCMIi-D5MzhuhwFC2b_uFstZYUlpnIc5e0omBjYSdRag88omNJWiCxl1QIDBHdt25hROIdZ8QE5JffGJbu6MCxeLX0_nsYhK_mBQRcJcZlsJL6qFl7ieT4YFWKVMZ102R_E_Mi11xcbyDav_RV6UQCHhTLhQvrTJPToXEznGi0NIM5yWVZLw6q6hchG11knJGmTdu2VXYPbf4Ayiu9g36Kwnr3ywe9VXuu2q3grod0aoBWM2Paqc1ZCYKn-gjqWxJFAnRqpXwYrnHyOAF2AD1ESdSZtwKp0y198hGNl9_4HTghI10yJFkeg_Zi0p7W9ymDHoq4bxptGcm0oclPJukrXRXE0qczbTcqN3ByOyWP5qRL8GqmLg8vmsNpOaiYrRK3Ew35Xk-AvJKM3HzdGlilUhuUkE0VtSP9MGqFVSuJomzLIwwNFS2-x8I6t2zO53SDrkhJClZ4yhoi27w-IQjiUCV6IruAUGacAJa5PEICsalQBntjXK6uqrRU9CoKzWfCHHAA9hUTO0JZ9zTLTGtbQ4VZqTmH-SqjQErDGbu4psReuFU8LvATfPeJqgtioeDnocvDOpj74UMBly04fOeVKm_YbZmd4kgnzdiUdgEXCJfOGVDxKN7u6K6haMOdgWtnYDA8SZEO4wIeNyWgOs2v0eMHm4uzbh_DqKHPOAgtssU-ZczKMkbDckRJCbeByCYa33lW-kGVzOKjek3b7WlNMIYVQYsjlS5kqjlCXpsX6OXL677E3fPq_QcS2iDMEXyVfFSax03WputLd1vUWfqo7LaQL5rSG3BIGD0Ny0Ykrgc7imHeDbMdBCWt38AQ1iEW_m72CxMldlw4YHAmuitODI5rG3fY5obNNw3V7uFr81wn0ep8eyKBvEDgvngncLwHDuRa7ceQgkbqvzrbqnO1QmS7gRcm_gNs79qCqz5w5H0-sndvjo5N2CtF2cYoxZ7l4P__1bvP04cP94pf5-mGzWt_fAfXsECsav21dF2LHVUX--oaU2zN4U_ji06fN4sP7R0j-DaJFwas3VLjev7qfv_uIzchnp-V6K1GsBsxGnGGx_B0Z3yB-b9__jnS_vn03f_Y37-efNrfrp-Xd49MvFBfB2YNV747FKx-oqtWcWscmU1Q9UWp4sbl2_p-y7JyBCXPuUQNWndFv2f6S1xUONv9XpFeJbL__6OcXlPBD3R2rlJGnaYoGE1t__MBJKZIlFW5UK_vOs-8aaksGhQS9KJWCd0baI4uhoRpqwwW6etaPSw93Hx5YSn0r511BA4_7o4IT3zK2msO_S8ao6y5o3jlruCa1H6H2E6j5QYqcX9EIuq6aDrJXHpXRPRqrZO9k-8g1L20Jntoa_-VJ88qgFJYf0ZCjr9G_A-r_FGjSg7qu-wNPHUe1fzRN36O9PUsaVgCg6dhyGJHQcwmY4hJjh8QMhHzB1NpPN7bTG3R9rm2LNCfs4Ao76rH9sRdMnW0UpGGUc5FnuWS_yeO4jcYPJs90cF0hhBERtql_RDC062caLgrbhQbA55Vx6dARMnwTMp5Mxs52ImSUpNLLsyRkPw1DAfywWlxEXj_ov4GWTBNn601CEUzSWMZJDAWHGfbn1bvEvVA0fsuvb0zdr0z7_Ihi18O3_aiNRBOqkXYCtdVngKYJ4dq2dn79_7o3lfUVhxu4hzjwcjlNxkkSpNnU45EPz8hwOvVjfxLkMSaaldEC35lpvlerXIEN4XTIhptsFmbTcMpvWtUWckYhzU4xDU1xjervYKhQtuzDa2Tgsw9dz_UcLSLKO-yb6_Km08XsBw2czg4_DibWz_Y-trI3G9I-jqJgcrOb5SLNk4kU_iQJkjxKRSDGnpRhFvlChBN5U_BUFmYGQ8Poldz3lyPrtuWNmgVeEHihn3hePI18VwRRwHMvSbjvw9UZ7oaypDvP8YZ3o2eWEu7YBouFQhE-L6L8qi1uDBYO8vur6OwzMr-6scAzS_xvKIr7UA">