[Mlir-commits] [mlir] [mlir][acc] Introduce acc loop tiling pass (PR #171692)
Razvan Lupusoru
llvmlistbot at llvm.org
Wed Dec 10 12:30:42 PST 2025
https://github.com/razvanlupusoru created https://github.com/llvm/llvm-project/pull/171692
This pass implements the OpenACC loop tiling transformation for acc.loop operations that have the tile clause (OpenACC 3.4 spec, section 2.9.8).
The tile clause specifies that the iterations of the associated loops should be divided into tiles (rectangular blocks). The pass transforms a single or nested acc.loop with tile clauses into a structure of "tile loops" (iterating over tiles) containing "element loops" (iterating within tiles).
For example, tiling a 2-level nested loop with tile(T1, T2):
// Before tiling:
acc.loop tile(T1, T2) control(%i, %j) = ...
// After tiling:
acc.loop control(%i) step (s1*T1) { // tile loop 1
acc.loop control(%j) step (s2*T2) { // tile loop 2
acc.loop control(%ii) = (%i) to (min(ub1, %i+s1*T1)) {
acc.loop control(%jj) = (%j) to (min(ub2, %j+s2*T2)) {
// loop body using %ii, %jj
}
}
}
}
Key features:
- Handles constant tile sizes and wildcard tile sizes ('*') which use a configurable default tile size
- Properly handles collapsed loops with tile counts exceeding collapse count by uncollapsing loops before tiling
- Distributes gang/worker/vector attributes appropriately: gang -> tile loops, vector -> element loops
- Validates that tile size types are not wider than loop IV types
- Emits optimization remarks for tiling decisions
Three test files are added:
- acc-loop-tiling.mlir: Tests single and nested loop tiling with constant tile sizes, unknown tile sizes (*), and loops with collapse attributes
- acc-loop-tiling-invalid.mlir: Tests error diagnostic when tile size type is wider than the loop IV type
- acc-loop-tiling-remarks.mlir: Tests optimization remarks emitted for tiling decisions including default tile size selection
>From 44083d400755e9a803e71f3f89a937e4e605eee8 Mon Sep 17 00:00:00 2001
From: Vijay Kandiah <vkandiah at nvidia.com>
Date: Wed, 10 Dec 2025 12:27:28 -0800
Subject: [PATCH] [mlir][acc] Introduce acc loop tiling pass
This pass implements the OpenACC loop tiling transformation for
acc.loop operations that have the tile clause (OpenACC 3.4 spec,
section 2.9.8).
The tile clause specifies that the iterations of the associated
loops should be divided into tiles (rectangular blocks). The pass
transforms a single or nested acc.loop with tile clauses into a
structure of "tile loops" (iterating over tiles) containing
"element loops" (iterating within tiles).
For example, tiling a 2-level nested loop with tile(T1, T2):
// Before tiling:
acc.loop tile(T1, T2) control(%i, %j) = ...
// After tiling:
acc.loop control(%i) step (s1*T1) { // tile loop 1
acc.loop control(%j) step (s2*T2) { // tile loop 2
acc.loop control(%ii) = (%i) to (min(ub1, %i+s1*T1)) {
acc.loop control(%jj) = (%j) to (min(ub2, %j+s2*T2)) {
// loop body using %ii, %jj
}
}
}
}
Key features:
- Handles constant tile sizes and wildcard tile sizes ('*')
which use a configurable default tile size
- Properly handles collapsed loops with tile counts exceeding
collapse count by uncollapsing loops before tiling
- Distributes gang/worker/vector attributes appropriately:
gang -> tile loops, vector -> element loops
- Validates that tile size types are not wider than loop IV types
- Emits optimization remarks for tiling decisions
Three test files are added:
- acc-loop-tiling.mlir: Tests single and nested loop tiling with
constant tile sizes, unknown tile sizes (*), and loops with
collapse attributes
- acc-loop-tiling-invalid.mlir: Tests error diagnostic when tile
size type is wider than the loop IV type
- acc-loop-tiling-remarks.mlir: Tests optimization remarks emitted
for tiling decisions including default tile size selection
---
.../Dialect/OpenACC/Analysis/OpenACCSupport.h | 41 ++++
.../mlir/Dialect/OpenACC/OpenACCUtils.h | 13 ++
.../mlir/Dialect/OpenACC/Transforms/Passes.td | 42 ++++
.../OpenACC/Analysis/OpenACCSupport.cpp | 8 +
.../OpenACC/Transforms/ACCLoopTiling.cpp | 220 ++++++++++++++++++
.../Dialect/OpenACC/Transforms/CMakeLists.txt | 1 +
mlir/lib/Dialect/OpenACC/Utils/CMakeLists.txt | 1 +
.../Dialect/OpenACC/Utils/OpenACCUtils.cpp | 25 ++
.../OpenACC/Utils/OpenACCUtilsTiling.cpp | 30 +--
.../OpenACC/acc-loop-tiling-invalid.mlir | 15 ++
.../OpenACC/acc-loop-tiling-remarks.mlir | 75 ++++++
.../test/Dialect/OpenACC/acc-loop-tiling.mlir | 104 +++++++++
12 files changed, 560 insertions(+), 15 deletions(-)
create mode 100644 mlir/lib/Dialect/OpenACC/Transforms/ACCLoopTiling.cpp
create mode 100644 mlir/test/Dialect/OpenACC/acc-loop-tiling-invalid.mlir
create mode 100644 mlir/test/Dialect/OpenACC/acc-loop-tiling-remarks.mlir
create mode 100644 mlir/test/Dialect/OpenACC/acc-loop-tiling.mlir
diff --git a/mlir/include/mlir/Dialect/OpenACC/Analysis/OpenACCSupport.h b/mlir/include/mlir/Dialect/OpenACC/Analysis/OpenACCSupport.h
index 1274dc84303a5..984eaa8b8d78b 100644
--- a/mlir/include/mlir/Dialect/OpenACC/Analysis/OpenACCSupport.h
+++ b/mlir/include/mlir/Dialect/OpenACC/Analysis/OpenACCSupport.h
@@ -50,8 +50,10 @@
#ifndef MLIR_DIALECT_OPENACC_ANALYSIS_OPENACCSUPPORT_H
#define MLIR_DIALECT_OPENACC_ANALYSIS_OPENACCSUPPORT_H
+#include "mlir/IR/Remarks.h"
#include "mlir/IR/Value.h"
#include "mlir/Pass/AnalysisManager.h"
+#include "llvm/ADT/StringRef.h"
#include <memory>
#include <string>
@@ -62,6 +64,8 @@ namespace acc {
enum class RecipeKind : uint32_t;
bool isValidSymbolUse(Operation *user, SymbolRefAttr symbol,
Operation **definingOpPtr);
+remark::detail::InFlightRemark emitRemark(Operation *op, const Twine &message,
+ llvm::StringRef category);
namespace detail {
/// This class contains internal trait classes used by OpenACCSupport.
@@ -82,6 +86,14 @@ struct OpenACCSupportTraits {
// Used to report a case that is not supported by the implementation.
virtual InFlightDiagnostic emitNYI(Location loc, const Twine &message) = 0;
+ // Used to emit an OpenACC remark. The category is optional and is used to
+ // either capture the pass name or pipeline phase when the remark is
+ // emitted. When not provided, in the default implementation, the category
+ // is "openacc".
+ virtual remark::detail::InFlightRemark
+ emitRemark(Operation *op, const Twine &message,
+ llvm::StringRef category) = 0;
+
/// Check if a symbol use is valid for use in an OpenACC region.
virtual bool isValidSymbolUse(Operation *user, SymbolRefAttr symbol,
Operation **definingOpPtr) = 0;
@@ -101,6 +113,7 @@ struct OpenACCSupportTraits {
Operation **>;
template <typename ImplT, typename... Args>
+
using isValidValueUse_t =
decltype(std::declval<ImplT>().isValidValueUse(std::declval<Args>()...));
@@ -108,6 +121,14 @@ struct OpenACCSupportTraits {
using has_isValidValueUse =
llvm::is_detected<isValidValueUse_t, ImplT, Value, Region &>;
+ template <typename ImplT, typename... Args>
+ using emitRemark_t =
+ decltype(std::declval<ImplT>().emitRemark(std::declval<Args>()...));
+
+ template <typename ImplT>
+ using has_emitRemark = llvm::is_detected<emitRemark_t, ImplT, Operation *,
+ const Twine &, llvm::StringRef>;
+
/// This class wraps a concrete OpenACCSupport implementation and forwards
/// interface calls to it. This provides type erasure, allowing different
/// implementation types to be used interchangeably without inheritance.
@@ -131,6 +152,15 @@ struct OpenACCSupportTraits {
return impl.emitNYI(loc, message);
}
+ remark::detail::InFlightRemark emitRemark(Operation *op,
+ const Twine &message,
+ llvm::StringRef category) final {
+ if constexpr (has_emitRemark<ImplT>::value)
+ return impl.emitRemark(op, message, category);
+ else
+ return acc::emitRemark(op, message, category);
+ }
+
bool isValidSymbolUse(Operation *user, SymbolRefAttr symbol,
Operation **definingOpPtr) final {
if constexpr (has_isValidSymbolUse<ImplT>::value)
@@ -198,6 +228,17 @@ class OpenACCSupport {
/// unsupported case.
InFlightDiagnostic emitNYI(Location loc, const Twine &message);
+ /// Emit an OpenACC remark.
+ ///
+ /// \param op The operation to emit the remark for.
+ /// \param message The remark message.
+ /// \param category Optional category for the remark. Defaults to "openacc".
+ /// \return An in-flight remark object that can be used to append
+ /// additional information to the remark.
+ remark::detail::InFlightRemark
+ emitRemark(Operation *op, const Twine &message,
+ llvm::StringRef category = "openacc");
+
/// Check if a symbol use is valid for use in an OpenACC region.
///
/// \param user The operation using the symbol.
diff --git a/mlir/include/mlir/Dialect/OpenACC/OpenACCUtils.h b/mlir/include/mlir/Dialect/OpenACC/OpenACCUtils.h
index e9ce9b3a36aba..e3f4e6889ffe8 100644
--- a/mlir/include/mlir/Dialect/OpenACC/OpenACCUtils.h
+++ b/mlir/include/mlir/Dialect/OpenACC/OpenACCUtils.h
@@ -10,7 +10,9 @@
#define MLIR_DIALECT_OPENACC_OPENACCUTILS_H_
#include "mlir/Dialect/OpenACC/OpenACC.h"
+#include "mlir/IR/Remarks.h"
#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
namespace mlir {
class DominanceInfo;
@@ -81,6 +83,17 @@ getDominatingDataClauses(mlir::Operation *computeConstructOp,
mlir::DominanceInfo &domInfo,
mlir::PostDominanceInfo &postDomInfo);
+/// Emit an OpenACC remark for the given operation with the given message.
+///
+/// \param op The operation to emit the remark for.
+/// \param message The remark message.
+/// \param category Optional category for the remark. Defaults to "openacc".
+/// \return An in-flight remark object that can be used to append
+/// additional information to the remark.
+remark::detail::InFlightRemark emitRemark(mlir::Operation *op,
+ const llvm::Twine &message,
+ llvm::StringRef category = "openacc");
+
} // namespace acc
} // namespace mlir
diff --git a/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td b/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
index b37cc282d4555..253311e12932d 100644
--- a/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
@@ -152,4 +152,46 @@ def ACCLegalizeSerial : Pass<"acc-legalize-serial", "mlir::func::FuncOp"> {
"mlir::arith::ArithDialect"];
}
+
+def ACCLoopTiling : Pass<"acc-loop-tiling", "mlir::func::FuncOp"> {
+ let summary = "Tile OpenACC loops with tile clauses";
+ let description = [{
+ This pass implements loop tiling transformations for OpenACC loops that
+ have tile clauses. The pass transforms loops with `tile(size1, size2, ...)`
+ clauses into tiled loop nests.
+
+ For a 2-level nested loop with tile(T1, T2), the transformation produces:
+ - Outer tile loops that iterate over tiles
+ - Inner element loops that iterate within each tile
+
+ Example transformation:
+ ```
+ // Before:
+ #pragma acc loop tile(32, 32)
+ for (i = 0; i < N; i++)
+ for (j = 0; j < M; j++)
+ A[i][j] = ...
+
+ // After:
+ for (i = 0; i < N; i += 32) // tile loop 1
+ for (j = 0; j < M; j += 32) // tile loop 2
+ for (ii = i; ii < min(N, i+32); ii++) // element loop 1
+ for (jj = j; jj < min(M, j+32); jj++) // element loop 2
+ A[ii][jj] = ...
+ ```
+
+ The pass handles:
+ - Constant tile sizes
+ - Wildcard tile sizes ('*') which use a default tile size
+ - Collapsed loops with tile counts exceeding collapse count
+ - Proper handling of loop attributes (gang, worker, vector)
+ }];
+ let dependentDialects = ["mlir::acc::OpenACCDialect",
+ "mlir::arith::ArithDialect"];
+ let options = [
+ Option<"defaultTileSize", "default-tile-size", "int32_t", "32",
+ "Default tile size to use for wildcard ('*') tile sizes">
+ ];
+}
+
#endif // MLIR_DIALECT_OPENACC_TRANSFORMS_PASSES
diff --git a/mlir/lib/Dialect/OpenACC/Analysis/OpenACCSupport.cpp b/mlir/lib/Dialect/OpenACC/Analysis/OpenACCSupport.cpp
index f6bac17591ee1..c487c43e8369c 100644
--- a/mlir/lib/Dialect/OpenACC/Analysis/OpenACCSupport.cpp
+++ b/mlir/lib/Dialect/OpenACC/Analysis/OpenACCSupport.cpp
@@ -41,6 +41,14 @@ InFlightDiagnostic OpenACCSupport::emitNYI(Location loc, const Twine &message) {
return mlir::emitError(loc, "not yet implemented: " + message);
}
+remark::detail::InFlightRemark
+OpenACCSupport::emitRemark(Operation *op, const Twine &message,
+ llvm::StringRef category) {
+ if (impl)
+ return impl->emitRemark(op, message, category);
+ return acc::emitRemark(op, message, category);
+}
+
bool OpenACCSupport::isValidSymbolUse(Operation *user, SymbolRefAttr symbol,
Operation **definingOpPtr) {
if (impl)
diff --git a/mlir/lib/Dialect/OpenACC/Transforms/ACCLoopTiling.cpp b/mlir/lib/Dialect/OpenACC/Transforms/ACCLoopTiling.cpp
new file mode 100644
index 0000000000000..23bec6a295061
--- /dev/null
+++ b/mlir/lib/Dialect/OpenACC/Transforms/ACCLoopTiling.cpp
@@ -0,0 +1,220 @@
+//===- ACCLoopTiling.cpp - Tile ACC Loops ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass implements the OpenACC loop tiling transformation for acc.loop
+// operations that have the tile clause (OpenACC 3.4 spec, section 2.9.8).
+//
+// Overview:
+// ---------
+// The tile clause specifies that the iterations of the associated loops should
+// be divided into tiles (rectangular blocks). This pass transforms a single
+// or nested acc.loop with tile clauses into a structure of "tile loops"
+// (iterating over tiles) containing "element loops" (iterating within tiles).
+//
+// For example, tiling a 2-level nested loop with tile(T1, T2) produces:
+//
+// // Before tiling:
+// acc.loop tile(T1, T2) control(%i, %j) = (lb1, lb2) to (ub1, ub2) step (s1,
+// s2)
+//
+// // After tiling:
+// acc.loop control(%i) = (lb1) to (ub1) step (s1*T1) { // tile loop 1
+// acc.loop control(%j) = (lb2) to (ub2) step (s2*T2) { // tile loop 2
+// acc.loop control(%ii) = (%i) to (min(ub1, %i+s1*T1)) step (s1) { //
+// element 1
+// acc.loop control(%jj) = (%j) to (min(ub2, %j+s2*T2)) step (s2) { //
+// element 2
+// // loop body using %ii, %jj
+// }
+// }
+// }
+// }
+//
+// Gang/worker/vector attributes are distributed as follows:
+// - gang: applied to tile loops
+// - vector: applied to element loops
+// - worker: removed from inner loops
+//
+// Unknown Tile Sizes:
+// -------------------
+// The OpenACC tile(*) syntax indicates an implementation-defined tile size.
+// In the IR, this is represented as -1. The pass resolves these to the
+// default tile size (configurable via pass option).
+//
+// Requirements:
+// -------------
+// 1. The pass uses the OpenACCSupport analysis for remark and NYI (not yet
+// implemented) emission. Custom implementations can be registered via
+// setImplementation() to provide pipeline-specific handling.
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenACC/Analysis/OpenACCSupport.h"
+#include "mlir/Dialect/OpenACC/OpenACC.h"
+#include "mlir/Dialect/OpenACC/OpenACCUtilsTiling.h"
+#include "mlir/Dialect/OpenACC/Transforms/Passes.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/IR/PatternMatch.h"
+#include "mlir/Support/LLVM.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+#include "llvm/Support/Debug.h"
+
+namespace mlir {
+namespace acc {
+#define GEN_PASS_DEF_ACCLOOPTILING
+#include "mlir/Dialect/OpenACC/Transforms/Passes.h.inc"
+} // namespace acc
+} // namespace mlir
+
+#define DEBUG_TYPE "acc-loop-tile"
+
+namespace {
+using namespace mlir;
+
+struct ACCLoopTilingImpl : public OpRewritePattern<acc::LoopOp> {
+ ACCLoopTilingImpl(MLIRContext *context, int32_t defaultTileSize,
+ acc::OpenACCSupport &accSupport)
+ : OpRewritePattern<acc::LoopOp>(context),
+ defaultTileSize(defaultTileSize), accSupport(accSupport) {}
+
+ // Check that tile size types are not narrower than IV types.
+ // We only check when both types are IntegerType. For IndexType, the width
+ // is target-dependent and the casting utility will handle it correctly.
+ LogicalResult checkTileSizeTypes(acc::LoopOp loop,
+ ArrayRef<Value> tileSizes) const {
+ auto ivTypes = loop.getBody().getArgumentTypes();
+ for (size_t i = 0; i < tileSizes.size() && i < ivTypes.size(); ++i) {
+ Type tileType = tileSizes[i].getType();
+ Type ivType = ivTypes[i];
+
+ // Skip unknown tile sizes (will be created with correct type)
+ auto constVal = getConstantIntValue(tileSizes[i]);
+ if (constVal && *constVal < 0)
+ continue;
+
+ // Only compare when both are integer types (not index)
+ auto tileIntType = dyn_cast<IntegerType>(tileType);
+ auto ivIntType = dyn_cast<IntegerType>(ivType);
+ if (tileIntType && ivIntType) {
+ if (tileIntType.getWidth() > ivIntType.getWidth()) {
+ accSupport.emitNYI(loop.getLoc(),
+ "tile size type (i" +
+ std::to_string(tileIntType.getWidth()) +
+ ") is wider than loop IV type (i" +
+ std::to_string(ivIntType.getWidth()) + ")");
+ return failure();
+ }
+ }
+ }
+ return success();
+ }
+
+ void emitTilingRemarks(acc::LoopOp loop, ArrayRef<Value> tileSizes) const {
+ // Emit remarks for loop tiling
+ size_t tileLevel = tileSizes.size();
+ std::string msg =
+ "Tiling " + std::to_string(tileLevel) + "-level loop nest with tile(";
+ for (size_t i = 0; i < tileSizes.size(); ++i) {
+ std::optional<int64_t> val = getConstantIntValue(tileSizes[i]);
+ if (*val == -1)
+ msg += "*";
+ else
+ msg += std::to_string(*val);
+ if (i < tileSizes.size() - 1)
+ msg += ",";
+ }
+ msg += ")";
+ accSupport.emitRemark(loop, llvm::Twine(msg), DEBUG_TYPE);
+
+ // Emit remarks for unknown tile sizes that will be resolved to default
+ // TODO: Need to base the default tile size on some heuristics.
+ for (Value tileSize : tileSizes) {
+ std::optional<int64_t> val = getConstantIntValue(tileSize);
+ if (val && *val < 0) {
+ std::string unknownMsg = "Picking default tile size " +
+ std::to_string(defaultTileSize) +
+ " for unknown tile size '*'";
+ accSupport.emitRemark(loop, llvm::Twine(unknownMsg), DEBUG_TYPE);
+ }
+ }
+ }
+
+ LogicalResult matchAndRewrite(acc::LoopOp origLoop,
+ PatternRewriter &rewriter) const override {
+
+ if (origLoop.getTileValues().empty())
+ return success();
+
+ SmallVector<Value> tileSizes(origLoop.getTileValues().begin(),
+ origLoop.getTileValues().end());
+ unsigned tileCount = tileSizes.size();
+ unsigned collapseCount = origLoop.getCollapseValue().value_or(1);
+
+ // Sanity check tile size types
+ if (failed(checkTileSizeTypes(origLoop, tileSizes)))
+ return failure();
+
+ // Emit remarks for loop tiling. This is emitted before the original loop
+ // is modified. However, it assumes that tiling will not fail.
+ emitTilingRemarks(origLoop, tileSizes);
+
+ LLVM_DEBUG(llvm::dbgs() << "\nBefore tiling:\n" << *origLoop << "\n");
+
+ // Clear tile operands from origLoop
+ rewriter.startOpModification(origLoop);
+ origLoop.getTileOperandsMutable().clear();
+ origLoop.removeTileOperandsSegmentsAttr();
+ origLoop.removeTileOperandsDeviceTypeAttr();
+ rewriter.finalizeOpModification(origLoop);
+
+ SmallVector<acc::LoopOp> loopsToTile;
+ if (collapseCount < tileCount) {
+ // Uncollapse tile loops before tiling if necessary
+ loopsToTile =
+ acc::uncollapseLoops(origLoop, tileCount, collapseCount, rewriter);
+ rewriter.replaceOp(origLoop, loopsToTile[0]);
+ LLVM_DEBUG(llvm::dbgs() << "\nAfter uncollapsing:\n"
+ << *loopsToTile[0] << "\n");
+ } else {
+ loopsToTile.push_back(origLoop);
+ }
+
+ // loopsToTile is a vector of perfectly nested loops. The outermost loop
+ // may have multiple IVs but inner loops can only have one IV.
+ // The utility handles unknown tile sizes (*) by using `defaultTileSize`.
+ acc::tileACCLoops(loopsToTile, tileSizes, defaultTileSize, rewriter);
+
+ LLVM_DEBUG(llvm::dbgs() << "\nAfter tiling:\n " << *loopsToTile[0] << "\n");
+ return success();
+ }
+
+private:
+ int32_t defaultTileSize;
+ acc::OpenACCSupport &accSupport;
+};
+
+class ACCLoopTiling : public acc::impl::ACCLoopTilingBase<ACCLoopTiling> {
+public:
+ using ACCLoopTilingBase<ACCLoopTiling>::ACCLoopTilingBase;
+
+ void runOnOperation() override {
+ func::FuncOp funcOp = getOperation();
+ MLIRContext *context = funcOp.getContext();
+ acc::OpenACCSupport &accSupport = getAnalysis<acc::OpenACCSupport>();
+
+ RewritePatternSet patterns(context);
+ patterns.insert<ACCLoopTilingImpl>(context, defaultTileSize, accSupport);
+ GreedyRewriteConfig grc;
+ grc.setUseTopDownTraversal(true);
+ grc.setMaxIterations(1);
+ (void)applyPatternsGreedily(funcOp, std::move(patterns), grc);
+ }
+};
+
+} // namespace
diff --git a/mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt b/mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt
index 10a1796972044..8d657852345ec 100644
--- a/mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt
@@ -1,5 +1,6 @@
add_mlir_dialect_library(MLIROpenACCTransforms
ACCImplicitData.cpp
+ ACCLoopTiling.cpp
ACCImplicitDeclare.cpp
ACCImplicitRoutine.cpp
ACCLegalizeSerial.cpp
diff --git a/mlir/lib/Dialect/OpenACC/Utils/CMakeLists.txt b/mlir/lib/Dialect/OpenACC/Utils/CMakeLists.txt
index c3de4f7e3e282..c7c322be70d09 100644
--- a/mlir/lib/Dialect/OpenACC/Utils/CMakeLists.txt
+++ b/mlir/lib/Dialect/OpenACC/Utils/CMakeLists.txt
@@ -16,6 +16,7 @@ add_mlir_dialect_library(MLIROpenACCUtils
LINK_LIBS PUBLIC
MLIRArithDialect
+ MLIRArithUtils
MLIROpenACCDialect
MLIRIR
MLIRSupport
diff --git a/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp b/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp
index 7f27b4495045f..bd64d758045a4 100644
--- a/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp
+++ b/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp
@@ -266,3 +266,28 @@ mlir::acc::getDominatingDataClauses(mlir::Operation *computeConstructOp,
return dominatingDataClauses.takeVector();
}
+
+mlir::remark::detail::InFlightRemark
+mlir::acc::emitRemark(mlir::Operation *op, const llvm::Twine &message,
+ llvm::StringRef category) {
+ using namespace mlir::remark;
+ mlir::Location loc = op->getLoc();
+ auto *engine = loc->getContext()->getRemarkEngine();
+ if (!engine)
+ return remark::detail::InFlightRemark{};
+
+ llvm::StringRef funcName;
+ if (auto func = dyn_cast<mlir::FunctionOpInterface>(op))
+ funcName = func.getName();
+ else if (auto funcOp = op->getParentOfType<mlir::FunctionOpInterface>())
+ funcName = funcOp.getName();
+
+ auto opts = RemarkOpts::name("openacc").category(category);
+ if (!funcName.empty())
+ opts = opts.function(funcName);
+
+ auto remark = engine->emitOptimizationRemark(loc, opts);
+ if (remark)
+ remark << message.str();
+ return remark;
+}
diff --git a/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsTiling.cpp b/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsTiling.cpp
index 0b344ba2f8316..c39bd06c81cbf 100644
--- a/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsTiling.cpp
+++ b/mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsTiling.cpp
@@ -13,21 +13,27 @@
#include "mlir/Dialect/OpenACC/OpenACCUtilsTiling.h"
#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Arith/Utils/Utils.h"
#include "mlir/Dialect/OpenACC/OpenACC.h"
#include "mlir/Dialect/Utils/StaticValueUtils.h"
#include "mlir/Transforms/RegionUtils.h"
// Resolve unknown tile sizes (represented as -1 for tile(*)) to the default.
-static mlir::Value resolveUnknownTileSize(mlir::Value tileSize,
+// Returns a value with the same type as targetType.
+static mlir::Value resolveAndCastTileSize(mlir::Value tileSize,
int32_t defaultTileSize,
+ mlir::Type targetType,
mlir::RewriterBase &rewriter,
mlir::Location loc) {
auto constVal = mlir::getConstantIntValue(tileSize);
- if (constVal && *constVal < 0)
+ if (constVal && *constVal < 0) {
+ // Create constant with the target type directly
return mlir::arith::ConstantOp::create(
- rewriter, loc, rewriter.getI32Type(),
- rewriter.getI32IntegerAttr(defaultTileSize));
- return tileSize;
+ rewriter, loc, targetType,
+ rewriter.getIntegerAttr(targetType, defaultTileSize));
+ }
+ return mlir::getValueOrCreateCastToIndexLike(rewriter, loc, targetType,
+ tileSize);
}
// Remove vector/worker attributes from loop
@@ -134,13 +140,6 @@ mlir::acc::tileACCLoops(llvm::SmallVector<mlir::acc::LoopOp> &tileLoops,
mlir::acc::LoopOp outerLoop = tileLoops[0];
const mlir::Location loc = outerLoop.getLoc();
- // Resolve unknown tile sizes (tile(*) represented as -1)
- llvm::SmallVector<mlir::Value> resolvedTileSizes;
- rewriter.setInsertionPoint(outerLoop);
- for (mlir::Value tileSize : tileSizes)
- resolvedTileSizes.push_back(
- resolveUnknownTileSize(tileSize, defaultTileSize, rewriter, loc));
-
mlir::acc::LoopOp innerLoop = tileLoops[tileLoops.size() - 1];
llvm::SmallVector<mlir::Value, 3> origIVs;
llvm::SmallVector<mlir::Value, 3> origSteps;
@@ -176,10 +175,11 @@ mlir::acc::tileACCLoops(llvm::SmallVector<mlir::acc::LoopOp> &tileLoops,
llvm::SmallVector<mlir::Value, 3> currentLoopSteps;
for (auto [j, step] : llvm::enumerate(tileLoop.getStep())) {
origSteps.push_back(step);
- if (i + j >= resolvedTileSizes.size()) {
+ if (i + j >= tileSizes.size()) {
currentLoopSteps.push_back(step);
} else {
- mlir::Value tileSize = resolvedTileSizes[i + j];
+ mlir::Value tileSize = resolveAndCastTileSize(
+ tileSizes[i + j], defaultTileSize, step.getType(), rewriter, loc);
auto newLoopStep =
mlir::arith::MulIOp::create(rewriter, loc, step, tileSize);
currentLoopSteps.push_back(newLoopStep);
@@ -214,7 +214,7 @@ mlir::acc::tileACCLoops(llvm::SmallVector<mlir::acc::LoopOp> &tileLoops,
// Create and insert nested elementLoopOps before terminator of outer loopOp
mlir::acc::LoopOp currentLoop = innerLoop;
- for (size_t i = 0; i < resolvedTileSizes.size(); i++) {
+ for (size_t i = 0; i < tileSizes.size(); i++) {
rewriter.setInsertionPoint(currentLoop.getBody().getTerminator());
mlir::DenseBoolArrayAttr inclusiveUBAttr = mlir::DenseBoolArrayAttr{};
if (inclusiveUBs[i])
diff --git a/mlir/test/Dialect/OpenACC/acc-loop-tiling-invalid.mlir b/mlir/test/Dialect/OpenACC/acc-loop-tiling-invalid.mlir
new file mode 100644
index 0000000000000..6ef1884345f82
--- /dev/null
+++ b/mlir/test/Dialect/OpenACC/acc-loop-tiling-invalid.mlir
@@ -0,0 +1,15 @@
+// RUN: mlir-opt %s -acc-loop-tiling -split-input-file -verify-diagnostics
+
+// Test that tile size type wider than IV type is rejected
+
+func.func @tile_wider_than_iv(%arg0: memref<100xf32>) {
+ %c0 = arith.constant 0 : i32
+ %c100 = arith.constant 100 : i32
+ %c1 = arith.constant 1 : i32
+ %c4_i64 = arith.constant 4 : i64 // i64 tile size with i32 IV
+ // expected-error @+1 {{not yet implemented: tile size type (i64) is wider than loop IV type (i32)}}
+ acc.loop tile({%c4_i64 : i64}) control(%i : i32) = (%c0 : i32) to (%c100 : i32) step (%c1 : i32) {
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
diff --git a/mlir/test/Dialect/OpenACC/acc-loop-tiling-remarks.mlir b/mlir/test/Dialect/OpenACC/acc-loop-tiling-remarks.mlir
new file mode 100644
index 0000000000000..bff904744b315
--- /dev/null
+++ b/mlir/test/Dialect/OpenACC/acc-loop-tiling-remarks.mlir
@@ -0,0 +1,75 @@
+// RUN: mlir-opt %s -acc-loop-tiling --remarks-filter="(open)?acc.*" 2>&1 | FileCheck %s
+
+// Test that the pass emits remarks for loop tiling
+
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=single_loop_remark | Remark="Tiling 1-level loop nest with tile(4)"
+func.func @single_loop_remark(%arg0: memref<100xf32>) {
+ %c0 = arith.constant 0 : index
+ %c100 = arith.constant 100 : index
+ %c1 = arith.constant 1 : index
+ %c4 = arith.constant 4 : index
+ acc.loop tile({%c4 : index}) control(%i : index) = (%c0 : index) to (%c100 : index) step (%c1 : index) {
+ %val = arith.index_castui %i : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i] : memref<100xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
+
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=nested_loop_remark | Remark="Tiling 2-level loop nest with tile(8,16)"
+func.func @nested_loop_remark(%arg0: memref<100x50xf32>) {
+ %c0 = arith.constant 0 : index
+ %c100 = arith.constant 100 : index
+ %c50 = arith.constant 50 : index
+ %c1 = arith.constant 1 : index
+ %c8 = arith.constant 8 : index
+ %c16 = arith.constant 16 : index
+ acc.loop tile({%c8 : index, %c16 : index}) control(%i : index, %j : index) = (%c0, %c0 : index, index) to (%c100, %c50 : index, index) step (%c1, %c1 : index, index) {
+ %sum = arith.addi %i, %j : index
+ %val = arith.index_castui %sum : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i, %j] : memref<100x50xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
+
+// Test remark for unknown tile size (*) represented as -1
+// Should use default tile size
+
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=unknown_tile_remark | Remark="Tiling 1-level loop nest with tile(*)"
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=unknown_tile_remark | Remark="Picking default tile size {{[0-9]+}} for unknown tile size '*'"
+func.func @unknown_tile_remark(%arg0: memref<1000xf32>) {
+ %c0 = arith.constant 0 : index
+ %c1000 = arith.constant 1000 : index
+ %c1 = arith.constant 1 : index
+ %cm1 = arith.constant -1 : i32 // tile(*) represented as -1
+ acc.loop tile({%cm1 : i32}) control(%i : index) = (%c0 : index) to (%c1000 : index) step (%c1 : index) {
+ %val = arith.index_castui %i : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i] : memref<1000xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
+
+// Test remark for multiple unknown tile sizes
+
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=multiple_unknown_tiles_remark | Remark="Tiling 2-level loop nest with tile(*,*)"
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=multiple_unknown_tiles_remark | Remark="Picking default tile size {{[0-9]+}} for unknown tile size '*'"
+// CHECK: remark: [Passed] openacc | Category:acc-loop-tile | Function=multiple_unknown_tiles_remark | Remark="Picking default tile size {{[0-9]+}} for unknown tile size '*'"
+func.func @multiple_unknown_tiles_remark(%arg0: memref<100x100xf32>) {
+ %c0 = arith.constant 0 : index
+ %c100 = arith.constant 100 : index
+ %c1 = arith.constant 1 : index
+ %cm1 = arith.constant -1 : i32 // tile(*) represented as -1
+ acc.loop tile({%cm1 : i32, %cm1 : i32}) control(%i : index, %j : index) = (%c0, %c0 : index, index) to (%c100, %c100 : index, index) step (%c1, %c1 : index, index) {
+ %sum = arith.addi %i, %j : index
+ %val = arith.index_castui %sum : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i, %j] : memref<100x100xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
diff --git a/mlir/test/Dialect/OpenACC/acc-loop-tiling.mlir b/mlir/test/Dialect/OpenACC/acc-loop-tiling.mlir
new file mode 100644
index 0000000000000..f4a46186b118d
--- /dev/null
+++ b/mlir/test/Dialect/OpenACC/acc-loop-tiling.mlir
@@ -0,0 +1,104 @@
+// RUN: mlir-opt %s -acc-loop-tiling | FileCheck %s
+
+// Test single-level loop tiling with tile(2)
+// Original loop: for i = 0 to 10 step 1
+// After tiling: tile loop (step=2) containing element loop (step=1)
+
+// CHECK-LABEL: func.func @single_loop_tile
+// CHECK: %[[C0:.*]] = arith.constant 0 : index
+// CHECK: %[[C10:.*]] = arith.constant 10 : index
+// CHECK: %[[C1:.*]] = arith.constant 1 : index
+// CHECK: %[[C2:.*]] = arith.constant 2 : index
+// CHECK: acc.loop control(%[[IV:.*]] : index) = (%[[C0]] : index) to (%[[C10]] : index) step (%[[C2]] : index) {
+// CHECK: %[[NEW_UB:.*]] = arith.addi %[[IV]], %[[C2]] : index
+// CHECK: %[[MIN_UB:.*]] = arith.minsi %[[NEW_UB]], %[[C10]] : index
+// CHECK: acc.loop control(%[[INNER_IV:.*]] : index) = (%[[IV]] : index) to (%[[MIN_UB]] : index) step (%[[C1]] : index) {
+// CHECK: acc.yield
+// CHECK: } attributes {independent = [#acc.device_type<none>]}
+// CHECK: acc.yield
+// CHECK: } attributes {independent = [#acc.device_type<none>]}
+func.func @single_loop_tile(%arg0: memref<10xf32>) {
+ %c0 = arith.constant 0 : index
+ %c10 = arith.constant 10 : index
+ %c1 = arith.constant 1 : index
+ %c2 = arith.constant 2 : index
+ acc.loop tile({%c2 : index}) control(%i : index) = (%c0 : index) to (%c10 : index) step (%c1 : index) {
+ %val = arith.index_castui %i : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i] : memref<10xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
+
+// Test 2-level nested loop tiling with tile(4, 8)
+// Creates: tile_loop_1 -> tile_loop_2 -> element_loop_1 -> element_loop_2
+
+// CHECK-LABEL: func.func @nested_loop_tile
+// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C100:.*]] = arith.constant 100 : index
+// CHECK-DAG: %[[C50:.*]] = arith.constant 50 : index
+// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
+// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
+// CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index
+// Outer tile loop with gang
+// CHECK: acc.loop gang control(%[[I:.*]] : index) = (%[[C0]] : index) to (%[[C100]] : index) step (%[[C4]] : index) {
+// Inner tile loop
+// CHECK: acc.loop control(%[[J:.*]] : index) = (%[[C0]] : index) to (%[[C50]] : index) step (%[[C8]] : index) {
+// Outer element loop with vector
+// CHECK: acc.loop vector control({{.*}} : index) = (%[[I]] : index) to ({{.*}} : index) step (%[[C1]] : index) {
+// Inner element loop
+// CHECK: acc.loop control({{.*}} : index) = (%[[J]] : index) to ({{.*}} : index) step (%[[C1]] : index) {
+// CHECK: acc.yield
+// CHECK: }
+// CHECK: acc.yield
+// CHECK: }
+// CHECK: acc.yield
+// CHECK: }
+// CHECK: acc.yield
+// CHECK: }
+func.func @nested_loop_tile(%arg0: memref<100x50xf32>) {
+ %c0 = arith.constant 0 : index
+ %c100 = arith.constant 100 : index
+ %c50 = arith.constant 50 : index
+ %c1 = arith.constant 1 : index
+ %c4 = arith.constant 4 : index
+ %c8 = arith.constant 8 : index
+ acc.loop gang vector tile({%c4 : index, %c8 : index}) control(%i : index, %j : index) = (%c0, %c0 : index, index) to (%c100, %c50 : index, index) step (%c1, %c1 : index, index) {
+ %sum = arith.addi %i, %j : index
+ %val = arith.index_castui %sum : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i, %j] : memref<100x50xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
+
+// Test unknown tile size (*) represented as -1
+// Should use default tile size (32)
+
+// CHECK-LABEL: func.func @unknown_tile_size
+// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C1000:.*]] = arith.constant 1000 : index
+// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
+// CHECK-DAG: %[[C32:.*]] = arith.constant 32 : index
+// Tile loop with default tile size
+// CHECK: acc.loop control(%[[IV:.*]] : index) = (%[[C0]] : index) to (%[[C1000]] : index) step (%[[C32]] : index) {
+// CHECK: acc.loop control({{.*}} : index) = (%[[IV]] : index) to ({{.*}} : index) step (%[[C1]] : index) {
+// CHECK: acc.yield
+// CHECK: }
+// CHECK: acc.yield
+// CHECK: }
+func.func @unknown_tile_size(%arg0: memref<1000xf32>) {
+ %c0 = arith.constant 0 : index
+ %c1000 = arith.constant 1000 : index
+ %c1 = arith.constant 1 : index
+ %cm1 = arith.constant -1 : i32 // tile(*) represented as -1
+ acc.loop tile({%cm1 : i32}) control(%i : index) = (%c0 : index) to (%c1000 : index) step (%c1 : index) {
+ %val = arith.index_castui %i : index to i32
+ %fval = arith.sitofp %val : i32 to f32
+ memref.store %fval, %arg0[%i] : memref<1000xf32>
+ acc.yield
+ } attributes {independent = [#acc.device_type<none>]}
+ return
+}
More information about the Mlir-commits
mailing list