[Mlir-commits] [mlir] [mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (PR #73328)

Fri Nov 24 05:51:45 PST 2023

https://github.com/Hsiangkai created https://github.com/llvm/llvm-project/pull/73328

When converting affine.for to GPU launch operator, we have to calculate the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero as the division result. However, the block dimension and thread dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim). So, we will get the wrong result if we use DivSIOp. In this patch, we replace it with CeilDivSIOp to get the correct block and thread dimension values.

>From 1d86d1b393784501c7c7d6aa9ed1b7502cd89265 Mon Sep 17 00:00:00 2001
From: Hsiangkai Wang <hsiangkai.wang at arm.com>
Date: Thu, 16 Nov 2023 16:36:37 +0000
Subject: [PATCH] [mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when
 lowering to GPU launch

When converting affine.for to GPU launch operator, we have to calculate
the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero
as the division result. However, the block dimension and thread dimension is
right-open range, i.e., [0, block_dim) and [0, thread_dim). So, we will
get the wrong result if we use DivSIOp. In this patch, we replace it
with CeilDivSIOp to get the correct block and thread dimension values.
---
 mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp        | 2 +-
 mlir/test/Conversion/SCFToGPU/step_positive.mlir | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp b/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
index 11b4cbb2506705b..a4862ebb24b178b 100644
--- a/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
+++ b/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
@@ -195,7 +195,7 @@ AffineLoopToGpuConverter::collectBounds(AffineForOp forOp, unsigned numLoops) {
                                                 upperBound, lowerBound);
     Value step = getOrCreateStep(currentLoop, builder);
     if (getConstantIntValue(step) != static_cast<int64_t>(1))
-      range = builder.create<arith::DivSIOp>(currentLoop.getLoc(), range, step);
+      range = builder.create<arith::CeilDivSIOp>(currentLoop.getLoc(), range, step);
     dims.push_back(range);
 
     lbs.push_back(lowerBound);
diff --git a/mlir/test/Conversion/SCFToGPU/step_positive.mlir b/mlir/test/Conversion/SCFToGPU/step_positive.mlir
index 97fd7d598621b39..84e8454e56171de 100644
--- a/mlir/test/Conversion/SCFToGPU/step_positive.mlir
+++ b/mlir/test/Conversion/SCFToGPU/step_positive.mlir
@@ -3,8 +3,8 @@
 // CHECK-LABEL: @step_var
 func.func @step_var(%A : memref<?x?xf32>, %B : memref<?x?xf32>) {
   // Check that we divide by step.
-  // CHECK:  %[[range_i:.*]] = arith.divsi {{.*}}, %{{.*}}
-  // CHECK:  %[[range_j:.*]] = arith.divsi {{.*}}, %{{.*}}
+  // CHECK:  %[[range_i:.*]] = arith.ceildivsi {{.*}}, %{{.*}}
+  // CHECK:  %[[range_j:.*]] = arith.ceildivsi {{.*}}, %{{.*}}
 
   // CHECK: gpu.launch
   // CHECK-SAME: blocks(%{{[^)]*}}, %{{[^)]*}}, %{{[^)]*}}) in (%{{[^)]*}} = %[[range_i]], %{{[^)]*}} = %{{[^)]*}}, %{{[^)]*}} = %{{[^)]*}})