[all-commits] [llvm/llvm-project] 09361b: [flang][hlfir] Allow expanding realloc assignments...

Slava Zakharin via All-commits all-commits at lists.llvm.org
Mon Sep 4 14:55:30 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 09361b19745f7cce0fde47257addfbfb447c9007
      https://github.com/llvm/llvm-project/commit/09361b19745f7cce0fde47257addfbfb447c9007
  Author: Slava Zakharin <szakharin at nvidia.com>
  Date:   2023-09-04 (Mon, 04 Sep 2023)

  Changed paths:
    M flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
    M flang/test/HLFIR/opt-scalar-assign.fir

  Log Message:
  -----------
  [flang][hlfir] Allow expanding realloc assignments with scalar RHS.

F18 10.2.1.3 p. 3 states:
If the variable is an unallocated allocatable array, expr shall have the same rank.

So if LHS is an array and RHS is a scalar, then LHS must be allocated and
the assignment is performed according to F18 10.2.1.3 p. 5:
If expr is a scalar and the variable is an array,
the expr is treated as if it were an array of the same shape as the
variable with every element of the array equal to the scalar value of expr.

This resolves performance regression in CPU2006/437.leslie3d caused
by extra Assign runtime calls for ALLOCATABLE local arrays.
Note that the extra calls do not add overhead themselves.
The problem is that the descriptor for ALLOCATABLE is passed
to Assign runtime function, and this messes up the points-to
analysis.

Example:
```
      ALLOCATABLE DUDX(:),DUDY(:),DUDZ(:)
...
      ALLOCATE( QS(IMAX-1),FSK(IMAX-1,0:KMAX,ND),
     >      QDIFFZ(IMAX-1), RMU(IMAX-1), EKCOEF(IMAX-1),
     >      DUDX(IMAX-1),DUDY(IMAX-1),DUDZ(IMAX-1),
...
      DUDZ=0D0
...
               DO I = I1, I2
                  DUDZ(I) =
     >                  DZI * ABD * ((U(I,J,KBD) - U(I,J,KCD)) +
     >                       8.0D0 * (U(I,J, KK) - U(I,J,KBD))) * R6I
```

When we are not lowering `DUDZ=0D0` to Assign call, the `base_addr` of
`DUDZ`'s descriptor is a result of `malloc`, and LLVM is able to figure out
that the accesses through this `base_addr` cannot overlap with accesses of,
for exmaple, module (global) variable DZI. This enables CSE and LICM
for the loop, eventually, resulting in clean vectorization.

When `DUDZ`'s descriptor "escapes" to Assign runtime function,
there are no guarantees about where `base_addr` can point to.
I do not think this can be resolved by using any existing LLVM function/argument
attributes. Maybe we will be able to communicate the no-aliasing information
to LLVM using `Full Restrict Support` representation.

For the purpose of enabling HLFIR by default, I am just aligning the IR
with what we have with FIR lowering.

Reviewed By: tblah

Differential Revision: https://reviews.llvm.org/D159391




More information about the All-commits mailing list