[all-commits] [llvm/llvm-project] a42a2c: Avoid buffer hoisting from parallel loops (#90735)
Rafael Ubal via All-commits
all-commits at lists.llvm.org
Fri May 3 23:35:59 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: a42a2ca19b2325fa6844d6b10e88eb53a3f2fde8
https://github.com/llvm/llvm-project/commit/a42a2ca19b2325fa6844d6b10e88eb53a3f2fde8
Author: Rafael Ubal <rubal at mathworks.com>
Date: 2024-05-04 (Sat, 04 May 2024)
Changed paths:
M mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
M mlir/include/mlir/Interfaces/LoopLikeInterface.h
M mlir/include/mlir/Interfaces/LoopLikeInterface.td
M mlir/lib/Dialect/Bufferization/Transforms/BufferOptimizations.cpp
M mlir/test/Dialect/Bufferization/Transforms/buffer-loop-hoisting.mlir
Log Message:
-----------
Avoid buffer hoisting from parallel loops (#90735)
This change corrects an invalid behavior in pass
`--buffer-loop-hoisting`. The pass is in charge of extracting buffer
allocations (e.g., `memref.alloca`) from loop regions (e.g., `scf.for`)
when possible. This works OK for looks with sequential execution
semantics. However, a buffer allocated in the body of a parallel loop
may be concurrently accessed by multiple thread to store its local data.
Extracting such buffer from the loop causes all threads to wrongly share
the same memory region.
In the following example, dimension 1 of the input tensor is reversed.
Dimension 0 is traversed with a parallel loop.
```
func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> {
%c0 = index.constant 0
%c1 = index.constant 1
%c2 = index.constant 2
%c3 = index.constant 3
%output = memref.alloc() : memref<2x3xf32>
scf.parallel (%index) = (%c0) to (%c2) step (%c1) {
// Create subviews for working input and output slices
%input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>>
%output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
// Copy the input slice into this temporary buffer. This intermediate
// copy is unnecessary, but is used for illustration purposes.
%temp = memref.alloc() : memref<1x3xf32>
memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32>
// Copy temporary buffer into output slice
memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
scf.reduce
}
return %output : memref<2x3xf32>
}
```
The patch submitted here prevents `%temp = memref.alloc() :
memref<1x3xf32>` from being hoisted when the containing op is
`scf.parallel` or `scf.forall`. A new op trait called
`HasParallelRegion` is introduced and assigned to these two ops to
indicate that their regions have parallel execution semantics.
@joker-eph @ftynse @nicolasvasilache @sabauma
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list