[flang-commits] [flang] 09361b1 - [flang][hlfir] Allow expanding realloc assignments with scalar RHS.
Slava Zakharin via flang-commits
flang-commits at lists.llvm.org
Mon Sep 4 14:55:18 PDT 2023
Author: Slava Zakharin
Date: 2023-09-04T14:55:09-07:00
New Revision: 09361b19745f7cce0fde47257addfbfb447c9007
URL: https://github.com/llvm/llvm-project/commit/09361b19745f7cce0fde47257addfbfb447c9007
DIFF: https://github.com/llvm/llvm-project/commit/09361b19745f7cce0fde47257addfbfb447c9007.diff
LOG: [flang][hlfir] Allow expanding realloc assignments with scalar RHS.
F18 10.2.1.3 p. 3 states:
If the variable is an unallocated allocatable array, expr shall have the same rank.
So if LHS is an array and RHS is a scalar, then LHS must be allocated and
the assignment is performed according to F18 10.2.1.3 p. 5:
If expr is a scalar and the variable is an array,
the expr is treated as if it were an array of the same shape as the
variable with every element of the array equal to the scalar value of expr.
This resolves performance regression in CPU2006/437.leslie3d caused
by extra Assign runtime calls for ALLOCATABLE local arrays.
Note that the extra calls do not add overhead themselves.
The problem is that the descriptor for ALLOCATABLE is passed
to Assign runtime function, and this messes up the points-to
analysis.
Example:
```
ALLOCATABLE DUDX(:),DUDY(:),DUDZ(:)
...
ALLOCATE( QS(IMAX-1),FSK(IMAX-1,0:KMAX,ND),
> QDIFFZ(IMAX-1), RMU(IMAX-1), EKCOEF(IMAX-1),
> DUDX(IMAX-1),DUDY(IMAX-1),DUDZ(IMAX-1),
...
DUDZ=0D0
...
DO I = I1, I2
DUDZ(I) =
> DZI * ABD * ((U(I,J,KBD) - U(I,J,KCD)) +
> 8.0D0 * (U(I,J, KK) - U(I,J,KBD))) * R6I
```
When we are not lowering `DUDZ=0D0` to Assign call, the `base_addr` of
`DUDZ`'s descriptor is a result of `malloc`, and LLVM is able to figure out
that the accesses through this `base_addr` cannot overlap with accesses of,
for exmaple, module (global) variable DZI. This enables CSE and LICM
for the loop, eventually, resulting in clean vectorization.
When `DUDZ`'s descriptor "escapes" to Assign runtime function,
there are no guarantees about where `base_addr` can point to.
I do not think this can be resolved by using any existing LLVM function/argument
attributes. Maybe we will be able to communicate the no-aliasing information
to LLVM using `Full Restrict Support` representation.
For the purpose of enabling HLFIR by default, I am just aligning the IR
with what we have with FIR lowering.
Reviewed By: tblah
Differential Revision: https://reviews.llvm.org/D159391
Added:
Modified:
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
flang/test/HLFIR/opt-scalar-assign.fir
Removed:
################################################################################
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 65c66eea2219162..437455b3defb1b9 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -459,9 +459,10 @@ class BroadcastAssignBufferization
mlir::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
hlfir::AssignOp assign, mlir::PatternRewriter &rewriter) const {
- if (assign.isAllocatableAssignment())
- return rewriter.notifyMatchFailure(assign, "AssignOp may imply allocation");
-
+ // Since RHS is a scalar and LHS is an array, LHS must be allocated
+ // in a conforming Fortran program, and LHS cannot be reallocated
+ // as a result of the assignment. So we can ignore isAllocatableAssignment
+ // and do the transformation always.
mlir::Value rhs = assign.getRhs();
if (!fir::isa_trivial(rhs.getType()))
return rewriter.notifyMatchFailure(
diff --git a/flang/test/HLFIR/opt-scalar-assign.fir b/flang/test/HLFIR/opt-scalar-assign.fir
index 7c1a15ef88c401f..2b1631a8202abd0 100644
--- a/flang/test/HLFIR/opt-scalar-assign.fir
+++ b/flang/test/HLFIR/opt-scalar-assign.fir
@@ -86,9 +86,19 @@ func.func @_QPtest3(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir
}
// CHECK-LABEL: func.func @_QPtest3(
// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir.bindc_name = "x"}) {
-// CHECK: %[[VAL_1:.*]] = arith.constant 0 : i32
-// CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
-// CHECK: hlfir.assign %[[VAL_1]] to %[[VAL_2]]#0 realloc : i32, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
+// CHECK: %[[VAL_2:.*]] = arith.constant 0 : index
+// CHECK: %[[VAL_3:.*]] = arith.constant 0 : i32
+// CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+// CHECK: %[[VAL_5:.*]] = fir.load %[[VAL_4]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+// CHECK: %[[VAL_6:.*]]:3 = fir.box_dims %[[VAL_5]], %[[VAL_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+// CHECK: fir.do_loop %[[VAL_7:.*]] = %[[VAL_1]] to %[[VAL_6]]#1 step %[[VAL_1]] unordered {
+// CHECK: %[[VAL_8:.*]]:3 = fir.box_dims %[[VAL_5]], %[[VAL_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+// CHECK: %[[VAL_9:.*]] = arith.subi %[[VAL_8]]#0, %[[VAL_1]] : index
+// CHECK: %[[VAL_10:.*]] = arith.addi %[[VAL_7]], %[[VAL_9]] : index
+// CHECK: %[[VAL_11:.*]] = hlfir.designate %[[VAL_5]] (%[[VAL_10]]) : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> !fir.ref<i32>
+// CHECK: hlfir.assign %[[VAL_3]] to %[[VAL_11]] : i32, !fir.ref<i32>
+// CHECK: }
// CHECK: return
// CHECK: }
More information about the flang-commits
mailing list