<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/112060>112060</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            LoopVersioningLICM missing vectorization when loop bound is invariant under no-alias assumption
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          huihzhang
      </td>
    </tr>
</table>

<pre>
    There are some patterns I noticed from my previous customer codebase study:
t1.cpp
```
void foo(int *A, int *B, int *LoopBound) {
  for (int k = 0; k < *LoopBound; k++)
    A[k] += B[k];
}
```
t2.cpp
```
class Base {
protected:
  int m_totalPhase;
};
class Derived : public Base {
public:
  void foo(int* A, int *B);
};

void Derived::foo(int *A, int *B) {
 for (int k = 0; k < m_totalPhase; k++)
    A[k] += B[k];
}
```

Loop bound in both cases requires load from pointer that may alias with memory accesses in loop. 
```
clang++ --target=aarch64 -mcpu=cortex-a57 -c -O3 t[1|2].cpp -Rpass-missed=loop-vectorize

t1.cpp:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
    2 |   for (int k = 0; k < *LoopBound; k++)
```

Polly is creating runtime versioning to achieve similar effect like loop versioning. The load of loop bound is hoisted outside, thus vectorization achieved.
```
clang++ --target=aarch64 -mcpu=cortex-a57 -c -O3 t[1|2].cpp -Rpass=loop-vectorize -mllvm -polly -mllvm -polly-invariant-load-hoisting -mllvm -polly-process-unprofitable

t1.cpp:2:3: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
 2 |   for (int k = 0; k < *LoopBound; k++)
t2.cpp:12:3: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
   12 |   for (int k = 0; k < m_totalPhase; k++)
```

Both cases are good fit for LoopVersioningLICM, where load of loop bound can be hoisted for the versioned loop with no-alias assumption.

I wonder if there are enough interest in community to support vectorization for such loops. I am currently working on different projects, so might not have enough bandwidth to proceed with upstream fixes.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVk1v4zYT_jX0ZSBDouw4PvhgxwgQYF-8i2LRa0GRI5ONxFHJobPeX19QVhwnm_0AumgBI6FEzQyfjxlJxegOHnEjljux3M9UYkthY5OzX6zyh1lD5rT5ZDEgqIAQqUcYFDMGH-EBPLHTaKAN1EN_giHg0VGKoFNk6jGAJoONigiRkzmJeivKvSi3XM31MJzX4qacfuPlkZyBlkjIW-cZhNxuhbyDab27Wn8gGnaUvBFyDWK1O8cDtBRgin4EUe-hFPVuXN69Dst3hdyNv_VzNMBWLHePYrmHvFHvYTddi3oqIVb7d4_O8puodKdihF1m4nLSIRCjZjQXWmCE1v_BxKr7aFXEVzWf1-dkewzuiAZEvYUhNZ3Tb_OPN6-Sv6ZWyC28oXb9brkrXaaaOWe9_Z5I14J8T483WH-tHue_WW9osuDgPDTEFrSKGCHgX8kFjNCRmjw8kPOMAdgqhl6dQHVORXhybKHHnsIJlNYYc7jz0BENc_iW4P5wBgNFwSockEW9Vypoe7OAotdDEvVeU2D8XKjlCgoNxf9rYLHcVWJ1J8Vyn-0ExW-DirHoXYyZ-X0uWhxRMwX3Ba-RTl1Vb6Wot3X2RcBehce8ylG5X-ESaUAsdz9Ivty_yCBBrO7gn7XXuwJ9pK47gYugAyp2_gAheXY9whFDdOTzLSZQ2jo8IkTXu04FwLZFzdC5RzzDe3l8Dp8snnWl9rw5OSCCJRcZDVDi6Axm37JN8UKMYkf-uZiZ_yvqfsU8FH3XHXsohpGcV1eF80cVnPJcZITFCChz9PqpIVB2apH8EKh1rJrup91yZZKRPCFvX9Pz5Azb_ORi6nsMHao8jjQlz9PO-sVi3_HWrzDWNHrrbfVfowGofgbQjwbfu52yexle-XV8IDLQOh4LZXp-vzTAh4e7_2UsT-O7-51G0MpDg5dmyBnYXjrumalx8HkqzmNQxZj6ITM2vz7VAzyRNxjAtTnH9K2AntLBnsnEyHlcaur75B2fcjfHNAwU-E3b5XPEpO1YPs7hAVQPOoWAnrsTPFF4zE4nD8a1LebbMAT6EzXHjDcS9O5geRx1Vh0v52iUN6PMufbYGmjO8NIQOaDqoXWfMU7IZmZTm3W9VjPcVCt5e3O7Wq_Kmd1gtapW1e1CqrrVldK1UWa91LWRZlE1ppq5jSzloiqrStblTVXOsdE3crFUSuK6kqYVixJ75bp5btY5hcPMxZhwU1WyvClnnWqwi-MXmZQen2DcFTJPjFnY5KCiSYcoFmXnIseXNOy4w83XPoA83TNrb0xv0b8ZjZfBAmkU9B3lZyl0G8s8xPwRIO-FvD84tqmZa-qFvM-nmf4VkzBC3o8YopD3E8jjRv4dAAD__yNpP2o">