[PATCH] D149281: Not disable loop unroll for vectorized loops on AMDGPU target

Mon May 22 11:07:44 PDT 2023

rampitec added a comment.

In D149281#4350286 <https://reviews.llvm.org/D149281#4350286>, @alex-t wrote:

> Fairly speaking, the whole idea of loop vectorization for GPU seems nonsense to me. Although I am not an expert in loop optimizations.
> GPU has no wide vector registers which may be used to process several scala values at one HW cycle and, by this, unroll the loop by the vector factor. Instead, each thread in a wavefront operates on its own separate value in a 32-bit wide lane for the divergent values and all threads operate on the same shared scalar value in case it is uniform.
> If we have a completely uniform input program (no dependence on thread ID) we could not get any better benefit than from the usual unroll performed by the loop unroll pass.
> So, IMO the LV is just a complicated and error-prone way to do loop unroll.
> Once again, I may not understand some subtle matters as I have no large experience with the LV.

Right, we do not have vector operations in the same sense as a SIMD machine, neither do we have vector registers in the same sense. Well, almost. We have vector (meaning wide, multicomponent) loads and stores and we have SOME v2 16-bit packed and v2 32-bit packed operations on some subtargets. Although this is not a full set of packed ALU as needed for vectorization and may not warrant a full loop vectorizer (SLP vectorizer is still in use). In any way there are no SIMD-style vector registers to control the interleave. Even though we are using register tuples for wide operations, these are not separate registers like for example XMMs on x86, and cut into the same register budget. So increasing 'number of vector registers' does not only mean much for our target but also inevitably leads to performance regressions.

All in all loop vectorizer does not do much for AMDGPU and cannot be a pass driving unroll. It still does something as said above, so whenever to keep it on or off is a completely separate question, but certainly it is no an unroll driver here.

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:576
     unsigned MaxIterationsCountToAnalyze;
+    /// Not disable runtime unroll for the loops which were vectorized.
+    bool unrollVectorizedLoop = false;
----------------
"Don't disable"...

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:577
+    /// Not disable runtime unroll for the loops which were vectorized.
+    bool unrollVectorizedLoop = false;
   };
----------------
Capitalize 'unroll'.

================
Comment at: llvm/test/CodeGen/AMDGPU/vectorize-unroll-metadata.ll:15
+; CHECK: !2 = distinct !{!2, !3, !1}
+; CHECK: !3 = !{!"llvm.loop.unroll.runtime.disable"}
+
----------------
I'd expect the test to show no unroll disable metadata?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149281/new/

https://reviews.llvm.org/D149281