[all-commits] [llvm/llvm-project] 43a95a: [MLIR] Introduce full/partial tile separation usin...
Uday Bondhugula via All-commits
all-commits at lists.llvm.org
Fri Mar 27 18:41:55 PDT 2020
Branch: refs/heads/master
Home: https://github.com/llvm/llvm-project
Commit: 43a95a543fbb1ed4b3903e88ce291444d4970f5a
https://github.com/llvm/llvm-project/commit/43a95a543fbb1ed4b3903e88ce291444d4970f5a
Author: Uday Bondhugula <uday at polymagelabs.com>
Date: 2020-03-28 (Sat, 28 Mar 2020)
Changed paths:
M mlir/include/mlir/Analysis/AffineStructures.h
M mlir/include/mlir/Dialect/Affine/IR/AffineOps.td
M mlir/include/mlir/Transforms/LoopUtils.h
M mlir/lib/Analysis/AffineStructures.cpp
M mlir/lib/Analysis/Utils.cpp
M mlir/lib/Dialect/Affine/Transforms/LoopTiling.cpp
M mlir/lib/Transforms/Utils/LoopUtils.cpp
M mlir/test/Dialect/Affine/loop-tiling.mlir
Log Message:
-----------
[MLIR] Introduce full/partial tile separation using if/else
This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).
INPUT
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
"foo"() : () -> ()
}
}
OUTPUT AFTER TILING W/O SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
OUTPUT AFTER TILING WITH SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0) -> (d0 + 32)>
map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
#set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.if #set0(%arg2, %arg3)[%M, %N] {
// Full tile.
affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
"foo"() : () -> ()
}
}
} else {
// Partial tile.
affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
}
The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.
Signed-off-by: Uday Bondhugula <uday at polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76700
More information about the All-commits
mailing list