[PATCH] D149281: Not disable loop unroll for vectorized loops on AMDGPU target

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 17 09:30:58 PDT 2023

alex-t added a comment.

In D149281#4329306 <https://reviews.llvm.org/D149281#4329306>, @fhahn wrote:


> Thanks for sharing. It still sounds to me that the underlying issue is proper interleaving cost-modeling in LV for those cases and that should be the long-term fix. Allowing unrolling again seems mostly like a short-term workaround, rather than a proper fix. Which is probably fine, but we should aim to improve the cost-modeling. This would need someone familiar with AMDGPU to drive.

Fairly speaking, the whole idea of loop vectorization for GPU seems nonsense to me. Although I am not an expert in loop optimizations.
GPU has no wide vector registers which may be used to process several scala values at one HW cycle and, by this, unroll the loop by the vector factor. Instead, each thread in a wavefront operates on its own separate value in a 32-bit wide lane for the divergent values and all threads operate on the same shared scalar value in case it is uniform.
If we have a completely uniform input program (no dependence on thread ID) we could not get any better benefit than from the usual unroll performed by the loop unroll pass.
So, IMO the LV is just a complicated and error-prone way to do loop unroll.
Once again, I may not understand some subtle matters as I have no large experience with the LV.



More information about the llvm-commits mailing list