[all-commits] [llvm/llvm-project] 43a95a: [MLIR] Introduce full/partial tile separation usin...

Uday Bondhugula via All-commits all-commits at lists.llvm.org
Fri Mar 27 18:41:55 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 43a95a543fbb1ed4b3903e88ce291444d4970f5a
      https://github.com/llvm/llvm-project/commit/43a95a543fbb1ed4b3903e88ce291444d4970f5a
  Author: Uday Bondhugula <uday at polymagelabs.com>
  Date:   2020-03-28 (Sat, 28 Mar 2020)

  Changed paths:
    M mlir/include/mlir/Analysis/AffineStructures.h
    M mlir/include/mlir/Dialect/Affine/IR/AffineOps.td
    M mlir/include/mlir/Transforms/LoopUtils.h
    M mlir/lib/Analysis/AffineStructures.cpp
    M mlir/lib/Analysis/Utils.cpp
    M mlir/lib/Dialect/Affine/Transforms/LoopTiling.cpp
    M mlir/lib/Transforms/Utils/LoopUtils.cpp
    M mlir/test/Dialect/Affine/loop-tiling.mlir

  Log Message:
  -----------
  [MLIR] Introduce full/partial tile separation using if/else

This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).

INPUT

  affine.for %i0 = 0 to %M {
    affine.for %i1 = 0 to %N {
      "foo"() : () -> ()
    }
  }

OUTPUT AFTER TILING W/O SEPARATION

  map0 = affine_map<(d0) -> (d0)>
  map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

  affine.for %arg2 = 0 to %M step 32 {
    affine.for %arg3 = 0 to %N step 32 {
      affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
        affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
          "foo"() : () -> ()
        }
      }
    }
  }

  OUTPUT AFTER TILING WITH SEPARATION

  map0 = affine_map<(d0) -> (d0)>
  map1 = affine_map<(d0) -> (d0 + 32)>
  map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

  #set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>

  affine.for %arg2 = 0 to %M step 32 {
    affine.for %arg3 = 0 to %N step 32 {
      affine.if #set0(%arg2, %arg3)[%M, %N] {
        // Full tile.
        affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
          affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
            "foo"() : () -> ()
          }
        }
      } else {
        // Partial tile.
        affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
          affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
            "foo"() : () -> ()
          }
        }
      }
    }
  }

The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.

Signed-off-by: Uday Bondhugula <uday at polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76700




More information about the All-commits mailing list