[PATCH] D149281: Not disable loop unroll for vectorized loops on AMDGPU target

Mon May 8 18:14:18 PDT 2023

alex-t added a comment.

In D149281#4322079 <https://reviews.llvm.org/D149281#4322079>, @nikic wrote:

> 

The question is why the vectorizer failed to unroll the loop in your workload.

It did not fail in fact. The "unrolling via interleaving" was deliberately disabled for the AMDGPU target since it led to the uncontrolled RP increase.
The corresponding change was addressed in https://reviews.llvm.org/D122850.

For CPU you decide about the interleave count by subtracting the loop invariants number from the number of the available registers and dividing the result by the RP for the given class. This allows us to estimate the number of computation flows that may run simultaneously.

For GPU, which is natively a SIMT machine this estimation on the high level merely does not make sense.
The LoopUnroll is controllable and lets us reasonably trade-off between the unroll size and the RP.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149281/new/

https://reviews.llvm.org/D149281