[Mlir-commits] [mlir] [mlir][affine][gpu] support unroll dynamic value and apply it to gpu.thread_id op (PR #128113)

Fri Feb 21 11:18:21 PST 2025

================
@@ -258,6 +259,89 @@ gpu.module @unroll_full {
   }
 }
 
+// UNROLL-FULL-LABEL: func @thread_partial_execution
+func.func @thread_partial_execution() {
+  %0 = arith.constant 0 :index
+  %1 = arith.constant 2 : index    
+  // UNROLL-FULL: %[[C0:.*]] = arith.constant 0 : index
+  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %1, %sz_by = %1, %sz_bz = %1)
+             threads(%tx, %ty, %tz) in (%sz_tx = %1, %sz_ty = %1, %sz_tz = %1) {
+    affine.for %iv = %tx to 3 step 2 iter_args(%arg = %0) -> index {
+      %3 = arith.addi %arg, %0 : index
+      affine.yield %3 : index
+    }
+    // UNROLL-FULL: %{{.*}} = affine.for %{{.*}} = %{{.*}} to 3 step 2 iter_args(%[[ARG:.*]] = %[[C0]]) -> (index) {
+    // UNROLL-FULL:   %[[SUM:.*]] = arith.addi %[[ARG]], %[[C0]] : index
+    // UNROLL-FULL:   affine.yield %[[SUM]] : index
+    // UNROLL-FULL: }
+    gpu.terminator
+  }
+  return
+}
+
+// UNROLL-FULL-LABEL: func @invalid_loop
+func.func @invalid_loop() {
+  %0 = arith.constant 0 :index
+  %1 = arith.constant 2 : index
+  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %1, %sz_by = %1, %sz_bz = %1)
+             threads(%tx, %ty, %tz) in (%sz_tx = %1, %sz_ty = %1, %sz_tz = %1) {
+    %threadid = gpu.thread_id x
+    affine.for %iv = %tx to 0 step 2 iter_args(%arg = %0) -> index {
----------------
linuxlonelyeagle wrote:

* The special case for unroll is that not all threads execute forOp.
```
  std::optional<uint64_t> tripCount = getConstantTripCount(forOp);
  std::optional<uint64_t> maxTripCount = getMaxConstantTripCount(forOp);
```
tripCount = 0
maxTripCount = 1
Keep the loop in this case.

* And the other is the case of dumping the IR in the loop out of the loop.
tripCount = 1
maxTripCount = 1

* core idea
 tripCount = (upper - `(blockSize - 1)`) div stride
 maxTripCount = `(uppper - 0)` div stride
 `(blockSize - 1 )` = maxThreadId = blockSIze - 1
 `0` = minThreadId = 0 
The above rules apply to all scoped Values.

https://github.com/llvm/llvm-project/pull/128113