[PATCH] D149281: Not disable loop unroll for vectorized loops on AMDGPU target

Tue May 9 04:43:08 PDT 2023

fhahn added a comment.

In D149281#4328386 <https://reviews.llvm.org/D149281#4328386>, @alex-t wrote:

> In D149281#4322079 <https://reviews.llvm.org/D149281#4322079>, @nikic wrote: The question is why the vectorizer failed to unroll the loop in your workload.
>
>> 
>
> It did not fail in fact. The "unrolling via interleaving" was deliberately disabled for the AMDGPU target since it led to the uncontrolled RP increase.
> The corresponding change was addressed in https://reviews.llvm.org/D122850.
>
> For CPU you decide about the interleave count by subtracting the loop invariants number from the number of the available registers and dividing the result by the RP for the given class. This allows us to estimate the number of computation flows that may run simultaneously.
>
> For GPU, which is natively a SIMT machine this estimation on the high level merely does not make sense.
> The LoopUnroll is controllable and lets us reasonably trade-off between the unroll size and the RP.

Thanks for sharing. It still sounds to me that the underlying issue is proper interleaving cost-modeling in LV for those cases and that should be the long-term fix. Allowing unrolling again seems mostly like a short-term workaround, rather than a proper fix. Which is probably fine, but we should aim to improve the cost-modeling. This would need someone familiar with AMDGPU to drive.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149281/new/

https://reviews.llvm.org/D149281