[Mlir-commits] [mlir] fdc0d43 - Introduce alloca_scope op

Fri Jun 11 10:28:48 PDT 2021

Author: Denys Shabalin
Date: 2021-06-11T19:28:41+02:00
New Revision: fdc0d4360b4e072bd91cdf9133fdf570d8fb16a2

URL: https://github.com/llvm/llvm-project/commit/fdc0d4360b4e072bd91cdf9133fdf570d8fb16a2
DIFF: https://github.com/llvm/llvm-project/commit/fdc0d4360b4e072bd91cdf9133fdf570d8fb16a2.diff

LOG: Introduce alloca_scope op

## Introduction

This proposal describes the new op to be added to the `std` (and later moved `memref`)
dialect called `alloca_scope`.

## Motivation

Alloca operations are easy to misuse, especially if one relies on it while doing
rewriting/conversion passes. For example let's consider a simple example of two
independent dialects, one defines an op that wants to allocate on-stack and
another defines a construct that corresponds to some form of looping:

```
dialect1.looping_op {
  %x = dialect2.stack_allocating_op
}
```

Since the dialects might not know about each other they are going to define a
lowering to std/scf/etc independently:

```
scf.for … {
   %x_temp = std.alloca …
   … // do some domain-specific work using %x_temp buffer
   … // and store the result into %result
   %x = %result
}
```

Later on the scf and `std.alloca` is going to be lowered to llvm using a
combination of `llvm.alloca` and unstructured control flow.

At this point the use of `%x_temp` is bound to either be either optimized by
llvm (for example using mem2reg) or in the worst case: perform an independent
stack allocation on each iteration of the loop. While the llvm optimizations are
likely to succeed they are not guaranteed to do so, and they provide
opportunities for surprising issues with unexpected use of stack size.

## Proposal

We propose a new operation that defines a finer-grain allocation scope for the
alloca-allocated memory called `alloca_scope`:

```
alloca_scope {
   %x_temp = alloca …
   ...
}
```

Here the lifetime of `%x_temp` is going to be bound to the narrow annotated
region within `alloca_scope`. Moreover, one can also return values out of the
alloca_scope with an accompanying `alloca_scope.return` op (that behaves
similarly to `scf.yield`):

```
%result = alloca_scope {
   %x_temp = alloca …
   …
   alloca_scope.return %myvalue
}
```

Under the hood the `alloca_scope` is going to lowered to a combination of
`llvm.intr.stacksave` and `llvm.intr.strackrestore` that are going to be invoked
automatically as control-flow enters and leaves the body of the `alloca_scope`.

The key value of the new op is to allow deterministic guaranteed stack use
through an explicit annotation in the code which is finer-grain than the
function-level scope of `AutomaticAllocationScope` interface. `alloca_scope`
can be inserted at arbitrary locations and doesn’t require non-trivial
transformations such as outlining.

## Which dialect

Before memref dialect is split, `alloca_scope` can temporarily reside in `std`
dialect, and later on be moved to `memref` together with the rest of
memory-related operations.

## Implementation

An implementation of the op is available [here](https://reviews.llvm.org/D97768).

Original commits:

* Add initial scaffolding for alloca_scope op
* Add alloca_scope.return op
* Add no region arguments and variadic results
* Add op descriptions
* Add failing test case
* Add another failing test
* Initial implementation of lowering for std.alloca_scope
* Fix backticks
* Fix getSuccessorRegions implementation

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D97768

Added: 
    mlir/test/Conversion/StandardToLLVM/convert-alloca-scope.mlir

Modified: 
    mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
    mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
    mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
    mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
    mlir/test/Dialect/MemRef/ops.mlir

Removed: 
    


################################################################################
diff  --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
index 13d27466c2719..1f694fc4d22d8 100644

--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
@@ -12,6 +12,7 @@
 #include "mlir/IR/Dialect.h"
 #include "mlir/Interfaces/CallInterfaces.h"
 #include "mlir/Interfaces/CastInterfaces.h"
+#include "mlir/Interfaces/ControlFlowInterfaces.h"
 #include "mlir/Interfaces/CopyOpInterface.h"
 #include "mlir/Interfaces/SideEffectInterfaces.h"
 #include "mlir/Interfaces/ViewLikeInterface.h"

diff  --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
index 8a2d04fdb65a7..16d7ec059bc69 100644
--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
@@ -9,6 +9,7 @@
 #ifndef MEMREF_OPS
 #define MEMREF_OPS
 
+include "mlir/Interfaces/ControlFlowInterfaces.td"
 include "mlir/Dialect/MemRef/IR/MemRefBase.td"
 include "mlir/IR/OpBase.td"
 include "mlir/Interfaces/CastInterfaces.td"
@@ -198,6 +199,84 @@ def MemRef_AllocaOp : AllocLikeOp<"alloca", AutomaticAllocationScopeResource> {
   }];
 }
 
+//===----------------------------------------------------------------------===//
+// AllocaScopeOp
+//===----------------------------------------------------------------------===//
+
+def MemRef_AllocaScopeOp : MemRef_Op<"alloca_scope", 
+      [DeclareOpInterfaceMethods<RegionBranchOpInterface>,
+       SingleBlockImplicitTerminator<"AllocaScopeReturnOp">,
+       RecursiveSideEffects,
+       NoRegionArguments]> {
+  let summary = "explicitly delimited scope for stack allocation";
+  let description = [{
+    The `memref.alloca_scope` operation represents an explicitly-delimited
+    scope for the alloca allocations. Any `memref.alloca` operations that are
+    used within this scope are going to be cleaned up automatically once
+    the control-flow exits the nested region. For example:
+
+    ```mlir
+    memref.alloca_scope {
+      %myalloca = memref.alloca(): memref<4x3xf32>
+      ...
+    }
+    ```
+
+    Here, `%myalloca` memref is valid within the explicitly delimited scope
+    and is automatically deallocated at the end of the given region.
+
+    `memref.alloca_scope` may also return results that are defined in the nested
+    region. To return a value, one should use `memref.alloca_scope.return`
+    operation:
+
+    ```mlir
+    %result = memref.alloca_scope {
+      ...
+      memref.alloca_scope.return %value
+    }
+    ```
+
+    If `memref.alloca_scope` returns no value, the `memref.alloca_scope.return ` can
+    be left out, and will be inserted implicitly.
+  }];
+
+  let results = (outs Variadic<AnyType>:$results);
+  let regions = (region SizedRegion<1>:$bodyRegion);
+}
+
+//===----------------------------------------------------------------------===//
+// AllocaScopeReturnOp
+//===----------------------------------------------------------------------===//
+
+def MemRef_AllocaScopeReturnOp : MemRef_Op<"alloca_scope.return", 
+      [HasParent<"AllocaScopeOp">,
+       NoSideEffect,
+       ReturnLike,
+       Terminator]> {
+  let summary = "terminator for alloca_scope operation";
+  let description = [{
+    `memref.alloca_scope.return` operation returns zero or more SSA values 
+    from the region within `memref.alloca_scope`. If no values are returned,
+    the return operation may be omitted. Otherwise, it has to be present
+    to indicate which values are going to be returned. For example:
+
+    ```mlir
+    memref.alloca_scope.return %value
+    ```
+  }];
+
+  let arguments = (ins Variadic<AnyType>:$results);
+  let builders = [OpBuilder<(ins), [{ /*nothing to do */ }]>];
+
+  let assemblyFormat =
+      [{ attr-dict ($results^ `:` type($results))? }];
+
+  // No custom verification needed.
+  let verifier = ?;
+}
+
+
+
 //===----------------------------------------------------------------------===//
 // BufferCastOp
 //===----------------------------------------------------------------------===//

diff  --git a/mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp b/mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
index 3ee6b31d08f53..61074382470e2 100644
--- a/mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
+++ b/mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
@@ -2028,6 +2028,60 @@ struct AllocaOpLowering : public AllocLikeOpLLVMLowering {
   }
 };
 
+struct AllocaScopeOpLowering
+    : public ConvertOpToLLVMPattern<memref::AllocaScopeOp> {
+  using ConvertOpToLLVMPattern<memref::AllocaScopeOp>::ConvertOpToLLVMPattern;
+
+  LogicalResult
+  matchAndRewrite(memref::AllocaScopeOp allocaScopeOp, ArrayRef<Value> operands,
+                  ConversionPatternRewriter &rewriter) const override {
+    OpBuilder::InsertionGuard guard(rewriter);
+    Location loc = allocaScopeOp.getLoc();
+
+    // Split the current block before the AllocaScopeOp to create the inlining
+    // point.
+    auto *currentBlock = rewriter.getInsertionBlock();
+    auto *remainingOpsBlock =
+        rewriter.splitBlock(currentBlock, rewriter.getInsertionPoint());
+    Block *continueBlock;
+    if (allocaScopeOp.getNumResults() == 0) {
+      continueBlock = remainingOpsBlock;
+    } else {
+      continueBlock = rewriter.createBlock(remainingOpsBlock,
+                                           allocaScopeOp.getResultTypes());
+      rewriter.create<BranchOp>(loc, remainingOpsBlock);
+    }
+
+    // Inline body region.
+    Block *beforeBody = &allocaScopeOp.bodyRegion().front();
+    Block *afterBody = &allocaScopeOp.bodyRegion().back();
+    rewriter.inlineRegionBefore(allocaScopeOp.bodyRegion(), continueBlock);
+
+    // Save stack and then branch into the body of the region.
+    rewriter.setInsertionPointToEnd(currentBlock);
+    auto stackSaveOp =
+        rewriter.create<LLVM::StackSaveOp>(loc, getVoidPtrType());
+    rewriter.create<BranchOp>(loc, beforeBody);
+
+    // Replace the alloca_scope return with a branch that jumps out of the body.
+    // Stack restore before leaving the body region.
+    rewriter.setInsertionPointToEnd(afterBody);
+    auto returnOp =
+        cast<memref::AllocaScopeReturnOp>(afterBody->getTerminator());
+    auto branchOp = rewriter.replaceOpWithNewOp<BranchOp>(
+        returnOp, continueBlock, returnOp.results());
+
+    // Insert stack restore before jumping out the body of the region.
+    rewriter.setInsertionPoint(branchOp);
+    rewriter.create<LLVM::StackRestoreOp>(loc, stackSaveOp);
+
+    // Replace the op with values return from the body region.
+    rewriter.replaceOp(allocaScopeOp, continueBlock->getArguments());
+
+    return success();
+  }
+};
+
 /// Copies the shaped descriptor part to (if `toDynamic` is set) or from
 /// (otherwise) the dynamically allocated memory for any operands that were
 /// unranked descriptors originally.
@@ -3885,6 +3939,7 @@ void mlir::populateStdToLLVMNonMemoryConversionPatterns(
       AddFOpLowering,
       AddIOpLowering,
       AllocaOpLowering,
+      AllocaScopeOpLowering,
       AndOpLowering,
       AssertOpLowering,
       AtomicRMWOpLowering,

diff  --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index f20234bd1d686..fe1a8e94b7c48 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -229,6 +229,65 @@ void AllocaOp::getCanonicalizationPatterns(RewritePatternSet &results,
       context);
 }
 
+//===----------------------------------------------------------------------===//
+// AllocaScopeOp
+//===----------------------------------------------------------------------===//
+
+static void print(OpAsmPrinter &p, AllocaScopeOp &op) {
+  bool printBlockTerminators = false;
+
+  p << AllocaScopeOp::getOperationName() << " ";
+  if (!op.results().empty()) {
+    p << " -> (" << op.getResultTypes() << ")";
+    printBlockTerminators = true;
+  }
+  p.printRegion(op.bodyRegion(),
+                /*printEntryBlockArgs=*/false,
+                /*printBlockTerminators=*/printBlockTerminators);
+  p.printOptionalAttrDict(op->getAttrs());
+}
+
+static ParseResult parseAllocaScopeOp(OpAsmParser &parser,
+                                      OperationState &result) {
+  // Create a region for the body.
+  result.regions.reserve(1);
+  Region *bodyRegion = result.addRegion();
+
+  // Parse optional results type list.
+  if (parser.parseOptionalArrowTypeList(result.types))
+    return failure();
+
+  // Parse the body region.
+  if (parser.parseRegion(*bodyRegion, /*arguments=*/{}, /*argTypes=*/{}))
+    return failure();
+  AllocaScopeOp::ensureTerminator(*bodyRegion, parser.getBuilder(),
+                                  result.location);
+
+  // Parse the optional attribute list.
+  if (parser.parseOptionalAttrDict(result.attributes))
+    return failure();
+
+  return success();
+}
+
+static LogicalResult verify(AllocaScopeOp op) {
+  if (failed(RegionBranchOpInterface::verifyTypes(op)))
+    return failure();
+
+  return success();
+}
+
+void AllocaScopeOp::getSuccessorRegions(
+    Optional<unsigned> index, ArrayRef<Attribute> operands,
+    SmallVectorImpl<RegionSuccessor> &regions) {
+  if (index.hasValue()) {
+    regions.push_back(RegionSuccessor(getResults()));
+    return;
+  }
+
+  regions.push_back(RegionSuccessor(&bodyRegion()));
+}
+
 //===----------------------------------------------------------------------===//
 // AssumeAlignmentOp
 //===----------------------------------------------------------------------===//

diff  --git a/mlir/test/Conversion/StandardToLLVM/convert-alloca-scope.mlir b/mlir/test/Conversion/StandardToLLVM/convert-alloca-scope.mlir
new file mode 100644
index 0000000000000..4a16c6796f1e3
--- /dev/null
+++ b/mlir/test/Conversion/StandardToLLVM/convert-alloca-scope.mlir
@@ -0,0 +1,55 @@
+// RUN: mlir-opt -convert-std-to-llvm %s | FileCheck %s
+
+// CHECK-LABEL: llvm.func @empty
+func @empty() {
+  // CHECK: llvm.intr.stacksave 
+  // CHECK: llvm.br
+  memref.alloca_scope {
+    memref.alloca_scope.return
+  }
+  // CHECK: llvm.intr.stackrestore 
+  // CHECK: llvm.br
+  // CHECK: llvm.return
+  return
+}
+
+// CHECK-LABEL: llvm.func @returns_nothing
+func @returns_nothing(%b: f32) {
+  %a = constant 10.0 : f32
+  // CHECK: llvm.intr.stacksave 
+  memref.alloca_scope {
+    %c = std.addf %a, %b : f32
+    memref.alloca_scope.return
+  }
+  // CHECK: llvm.intr.stackrestore 
+  // CHECK: llvm.return
+  return
+}
+
+// CHECK-LABEL: llvm.func @returns_one_value
+func @returns_one_value(%b: f32) -> f32 {
+  %a = constant 10.0 : f32
+  // CHECK: llvm.intr.stacksave 
+  %result = memref.alloca_scope -> f32 {
+    %c = std.addf %a, %b : f32
+    memref.alloca_scope.return %c: f32
+  }
+  // CHECK: llvm.intr.stackrestore 
+  // CHECK: llvm.return
+  return %result : f32
+}
+
+// CHECK-LABEL: llvm.func @returns_multiple_values
+func @returns_multiple_values(%b: f32) -> f32 {
+  %a = constant 10.0 : f32
+  // CHECK: llvm.intr.stacksave 
+  %result1, %result2 = memref.alloca_scope -> (f32, f32) {
+    %c = std.addf %a, %b : f32
+    %d = std.subf %a, %b : f32
+    memref.alloca_scope.return %c, %d: f32, f32
+  }
+  // CHECK: llvm.intr.stackrestore 
+  // CHECK: llvm.return
+  %result = std.addf %result1, %result2 : f32
+  return %result : f32
+}

diff  --git a/mlir/test/Dialect/MemRef/ops.mlir b/mlir/test/Dialect/MemRef/ops.mlir
index 1b5728486a367..bbd7fb35a3866 100644
--- a/mlir/test/Dialect/MemRef/ops.mlir
+++ b/mlir/test/Dialect/MemRef/ops.mlir
@@ -76,3 +76,12 @@ func @memref_dealloc() {
   memref.dealloc %1 : memref<*xf32>
   return
 }
+
+
+// CHECK-LABEL: func @memref_alloca_scope
+func @memref_alloca_scope() {
+  memref.alloca_scope {
+    memref.alloca_scope.return
+  }
+  return
+}