[flang-commits] [flang] 0623ce1 - [flang][RFC] Adding higher level FIR ops to ease expression lowering

Thu Oct 13 05:31:06 PDT 2022

Author: Jean Perier
Date: 2022-10-13T14:25:51+02:00
New Revision: 0623ce152a02eed954a1de82a1430d6d00647c9f

URL: https://github.com/llvm/llvm-project/commit/0623ce152a02eed954a1de82a1430d6d00647c9f
DIFF: https://github.com/llvm/llvm-project/commit/0623ce152a02eed954a1de82a1430d6d00647c9f.diff

LOG: [flang][RFC] Adding higher level FIR ops to ease expression lowering

This document describes a new HLFIR dialect with a new value type
(hlfir.expr) and some new higher level FIR operations to make lowering
to FIR more straightforward and make pattern matching of high level
Fortran concepts (array and character assignments, character operations,
and transformational intrinsics) easier in FIR.

This should allow implementing the remaining gaps in Fortran 95 features
without increasing the lowering code complexity too much, and get a
clean start for the remaining F2003 and F2018 features.

Differential Revision: https://reviews.llvm.org/D134285

Added: 
    flang/docs/HighLevelFIR.md

Modified: 
    

Removed: 
    


################################################################################
diff  --git a/flang/docs/HighLevelFIR.md b/flang/docs/HighLevelFIR.md
new file mode 100644
index 0000000000000..7119ec046ad7e

--- /dev/null
+++ b/flang/docs/HighLevelFIR.md
@@ -0,0 +1,1410 @@
+The approach of FIR and lowering design so far was to start with the minimal set
+of IR operations that could allow implementing the core aspects of Fortran (like
+memory allocations, array addressing, runtime descriptors, and structured
+control flow operations). One notable aspect of the current FIR is that array
+and character operations are buffered (some storage is allocated for the result,
+and the storage is addressed to implement the operation).  While this proved
+functional so far, the code lowering expressions and assignments from the
+front-end representations (the evaluate::Expr and parser nodes) to FIR has
+significantly grown in complexity while it still lacks some F95 features around
+character array expressions or FORALL. This is mainly explained by the fact that
+the representation level gap is big, and a lot is happening in lowering.  It
+appears more and more that some intermediate steps would help to split concerns
+between translating the front-end representation to MLIR, implementing some
+Fortran concepts at a lower-level (like character or derived type assignments),
+and how bufferizations of character and array expressions should be done.
+
+This document proposes the addition of two concepts and a set of related
+operations in a new dialect HLFIR to allow a simpler lowering to a higher-level
+FIR representation that would later be lowered to the current FIR representation
+via MLIR translation passes.  As a result of these additions, it is likely that
+the fir.array_load/fir.array_merge_store and related array operations could be
+removed from FIR since array assignment analysis could directly happen on the
+higher-level FIR representation.
+
+
+The main principles of the new lowering design are:
+-   Make expression lowering context independent and rather naive
+-   Do not materialize temporaries while lowering to FIR
+-   Preserve Fortran semantics/information for high-level optimizations
+
+The core impact on lowering will be:
+-   Lowering expressions and assignments in the exact same way, regardless of
+    whether it is an array assignment context and/or an expression inside a
+    forall.
+-   Lowering transformational intrinsics in a verbatim way (no runtime calls and
+    memory aspects yet).
+-   Lowering character expressions in a verbatim way (no memcpy/runtime calls
+    and memory aspects yet).
+-   Argument association side effects will be delayed (copy-in/copy-out) to help
+    inlining/function specialization to get rid of them when they are not
+    relevant.
+
+
+## Variable and Expression value concepts in HLFIR
+
+## Strengthening the variable concept
+
+Fortran variables are currently represented in FIR as mlir::Value with reference
+or box type coming from special operations or block arguments. They are either
+the result of a fir.alloca, fir.allocmem, or fir.address_of operations with the
+mangled name of the variable as attribute, or they are function block arguments
+with the mangled name of the variable as attribute.
+
+Fortran variables are defined with a Fortran type (both dynamic and static) that
+may have type parameters, a rank and shape (including lower bounds), and some
+attributes (like TARGET, OPTIONAL, VOLATILE...). All this information is
+currently not represented in FIR. Instead, lowering keeps track of all this
+information in the fir::ExtendedValue lowering data structure and uses it when
+needed. If unused in lowering, some information about variables is lost (like
+non-constant array bound expressions). In the IR, only the static type, the
+compile time constant extents, and compile time character lengths can be
+retrieved from the mlir::Value of a variable in the general case (more can be
+retrieved if the variable is tracked via a fir.box, but not if it is a bare
+memory reference).
+
+This makes reasoning about Fortran variables in FIR harder, and in general
+forces lowering to apply all decisions related to the information that is lost
+in FIR. A more problematic point is that it does not allow generating debug
+information for the variables from FIR, since the bounds and type parameters
+information is not tightly linked to the base mlir::Value.
+
+The proposal is to add a fir.declare operation that would anchor the
+fir::ExtendedValue information in the IR regardless of the mlir::Value used for
+the variable (bare memory reference, or fir.box). This operation will have a
+"fir.def = uniq_mangled_variable_name" that will allow linking it to the Fortran
+source variable, and will take all the bounds and type parameters as operands.
+All the high-level operations referring to variables will have a "fir.ref =
+uniq_mangled_variable_name" that will allow retrieving back the related
+dominating fir.declare and all the variable information. In most of the cases,
+the fir.declare should simply be the defining operation of the operand mlir
+value.
+
+The fir.declare operation will allow:
+- Pushing higher-level Fortran concepts into FIR operations (like array
+  assignments or transformational intrinsics).
+- Generating debug information for the variables based on the fir.declare
+  operation.
+- Generic Fortran aliasing analysis (currently implemented only around array
+  assignments with the fir.array_load concept).
+
+The fir.declare op is the only operation described by this change that will be
+added to FIR and not HLFIR. The rational for this is that it is intended to
+survive until LLVM dialect codegeneration so that debug info generation can use
+them and alias information can take advantage of them even on FIR. 
+
+Note that Fortran variables are not necessarily named objects, they can also be
+the result of function references returning POINTERs. fir.declare will also
+accept such variables to be described in the IR (a unique name will be built
+from the caller scope name and the function name.). In general, fir.declare
+will allow to view every memory storage as a variable, and this will be used to
+describe and use compiler created array temporaries.
+
+## Adding an expression value concept in HLFIR
+
+Currently, Fortran expressions can be represented as SSA values for scalar
+logical, integer, real, and complex expressions. Scalar character or
+derived-type expressions and all array expressions are buffered in lowering:
+their results are directly given a memory storage in lowering and are
+manipulated as variables.
+
+While this keeps FIR simple, this makes the amount of IR generated for these
+expressions higher, and in general makes later optimization passes job harder
+since they present non-trivial patterns (with memory operations) and cannot be
+eliminated by naive dead code elimination when the result is unused. This also
+forces lowering to combine elemental array expressions into single loop nests to
+avoid bufferizing all array sub-expressions (which would yield terrible
+performance). These combinations, which are implemented using C++ lambdas in
+lowering makes lowering code harder to understand. It also makes the expression
+lowering code context dependent (especially designators lowering). The lowering
+code paths may be 
diff erent when lowering a syntactically similar expression in
+an elemental expression context, in a forall context, or in a normal context.
+
+Some of the combinations described in [Array Composition](ArrayComposition.md)
+are currently not implemented in lowering because they are less trivial
+optimizations, and do not really belong in lowering. However, deploying such
+combinations on the generated FIR with bufferizations requires the usage of
+non-trivial pattern matching and rewrites (recognizing temporary allocation,
+usage, and related runtime calls). Note that the goal of such combination is not
+only about inlining transformational runtime calls, it is mainly about never
+generating a temporary for an array expression sub-operand that is a
+transformational intrinsic call matching certain criteria. So the optimization
+pass will not only need to recognize the intrinsic call, it must understand the
+context it is being called in.
+
+The usage of memory manipulations also makes some of the alias analysis more
+complex, especially when dealing with foralls (the alias analysis cannot simply
+follow an operand tree, it must understand indirect dependencies from operations
+stored in memory).
+
+The proposal is to add a !hlfir.expr<T> SSA value type concept, and set of
+character operations (concatenation, TRIM, MAX, MIN, comparisons...), a set of
+array transformational operations (SUM, MATMUL, TRANSPOSE, ...), and a generic
+hlfir.elemental operation. The hlfir.expr<T> type is not intended to be used
+with scalar types that already have SSA value types (e.g., integer or real
+scalars).  Instead, these existing SSA types will implicitly be considered as
+being expressions when used in high-level FIR operations, which will simplify
+interfacing with other dialects that define operations with these types (e.g.,
+the arith dialect).
+
+These hlfir.expr values could then be placed in memory when needed (assigned to
+a variable, passed as a procedure argument, or an IO output item...) via
+hlfir.assign or hlfir.associate operations that will later be described.
+
+When no special optimization pass is run, a translation pass would lower the
+operations producing hlfir.expr to buffer allocations and memory operations just
+as in the currently generated FIR.
+
+However, these high-level operations should allow the writing of optimization
+passes combining chains of operations producing hlfir.expr into optimized forms
+via pattern matching on the operand tree.
+
+The hlfir.elemental operation will be discussed in more detail below. It allows
+simplifying lowering while keeping the ability to combine elemental
+sub-expressions into a single loop nest. It should also allow rewriting some of
+the transformational intrinsic operations to functions of the indices as
+described in [Array Composition](ArrayComposition.md).
+
+## Proposed design for HLFIR (High-Level Fortran IR)
+
+### HLFIR Operations and Types
+
+#### Introduce a hlfir.expr<T> type
+
+Motivation: avoid the need to materialize expressions in temporaries while
+lowering.
+
+Syntax: ``` !hlfir.expr<[extent x]* T [, class]> ```
+
+- `[extent x]*` represents the shape for arrays similarly to !fir.array<> type,
+  except that the shape cannot be assumed rank (!hlfir.expr<..xT> is invalid).
+  This restriction can be added because it is impossible to create an assumed
+  rank expression in Fortran that is not a variable.
+- `T` is the element type of the static type
+- `class` flag can be set to denote that this a polymorphic expression (that the
+  dynamic type should not be assumed to be the static type).
+
+
+examples: !hlfir.expr<fir.char<?>>, !hlfir.expr<10xi32>,
+!hlfir.expr<?x10x?xfir.complex<4>>
+
+T in scalar hlfir.expr<T> can be:
+-   A character type (fir.char<10, kind>, fir.char<?, kind>)
+-   A derived type: (fir.type<t{...}>)
+
+T in an array hlfir.expr< e1 x ex2 ..  : T> can be:
+-   A character or derived type
+-   A logical type (fir.logical<kind>)
+-   An integer type (i1, i32, ….)
+-   A floating point type (f32, f16…)
+-   A complex type (fir.complex<4> or mlir::complex<f32>...)
+
+Some expressions may be polymorphic (for instance, MERGE can be used on
+polymorphic entities). The hlfir.expr type has an optional "class" flag to
+denote this: hlfir.expr<T, class>.
+
+Note that the ALLOCATABLE, POINTER, TARGET, VOLATILE, ASYNCHRONOUS, OPTIONAL
+aspects do not apply to expressions, they apply to variables.
+
+It is possible to query the following about an expression:
+-   What is the extent : via hlfir.get_extent %expr, dim
+-   What are the length parameters: via hlfir.get_typeparam %expr [, param_name]
+-   What is the dynamic type: via hlfir.get_dynamic_type %expr
+
+It is possible to get the value of an array expression element:
+- %element = hlfir.apply %expr, %i, %j : (!hlfir.expr<T>, index index) ->
+  hlfir.expr<ScalarType> | AnyConstantSizeScalarType
+
+It is not directly possible to take an address for the expression, but an
+expression value can be associated to a new variable whose address can be used
+(required when passing the expression in a user call, or to concepts that are
+kept low level in FIR, like IO runtime calls).  The variable created may be a
+compiler created temporary, or may relate to a Fortran source variable if this
+mechanism is used to implement ASSOCIATE.
+
+-   %var = hlfir.associate %expr [attributes about the association]->
+    AnyMemoryOrBoxType
+-   hlfir.end_association %var
+
+The intention is that the hlfir.expr<T> is the result of an operation, and
+should most often not be a block argument. This is because the hlfir.expr is
+mostly intended to allow combining chains of operations into more optimal
+forms. But it is possible to represent any expression result via a Fortran
+runtime descriptor (fir.box<T>), implying that if a hlfir.expr<T> is passed as
+a block argument, the expression bufferization pass will evaluate the operation
+producing the expression in a temporary, and transform the block operand into a
+fir.box describing the temporary. Clean-up for the temporary will be inserted
+after the last use of the hlfir.expr. Note that, at least at first, lowering
+may help FIR to find the last use of a hlfir.expr by explicitly inserting a
+hlfir.finalize %expr operation that may turn into a no-op if the expression is
+not later materialized in memory.
+
+It is nonetheless not intended that such abstract types be used as block
+arguments to avoid introducing allocations and descriptor manipulations.
+
+#### fir.declare operation
+
+Motivation: represent variables, linking together a memory storage, shape,
+length parameters, attributes and the variable name.
+
+Syntax:
+```
+%var = fir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T
+```
+
+%var will have the same type as %base. When no debug info is generated, the
+operation can be replaced by %base when lowering to LLVM.
+
+- Extents should only be provided if %base is not a fir.box and the entity is an
+  array.
+- lower bounds should only be provided if the entity is an array and the lower
+  bounds are not default (all ones). It should also not be provided for POINTERs
+  and ALLOCATABLES since the lower bounds may change.
+- type parameters should be provided for entities with length parameters, unless
+  the entity is a CHARACTER where the length is constant in %base type.
+- The attributes will include the Fortran attributes: TARGET (fir.target),
+  POINTER (fir.ptr), ALLOCATABLE (fir.alloc), CONTIGUOUS (fir.contiguous),
+  OPTIONAL (fir.optional), VOLATILE (fir.volatile), ASYNCHRONOUS (fir.async).
+  They will also indicate when an entity is part of an equivalence by giving the
+  equivalence name (fir.equiv = mangled_equivalence_name).
+
+fir.declare will be used for all Fortran variables, except the ones created via
+the ASSOCIATE construct that will use hlfir.associate described below.
+
+fir.declare will also be used when creating compiler created temporaries, in
+which case the fir.tmp attribute will be given.
+
+Examples:
+
+| FORTRAN                                 | FIR                                                                                                     |
+| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| REAL :: X                                 | %mem = fir.alloca f32 <br> %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<f32>                                                                                                     |
+| REAL, TARGET :: X(10)                     | %mem = fir.alloca f32 <br> %nval = fir.load %n <br> %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.target} : fir.ref<fir.array<10xf32>>                                                  |
+| REAL :: X(N)                              | %mem = // … alloc or dummy argument <br> %nval = fir.load %n : i64 <br> %x = fir.declare %mem shape %nval {fir.def = "\_QPfooEx"} : (i64) -> fir.ref<fir.array<?xf32>>                      |
+| REAL :: X(0:)                             | %mem = // … dummy argument <br> %c0 = arith.constant 0 : index <br> %x = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEx"} : (index) -> fir.box<fir.array<?xf32>>                            |
+| <br>REAL, POINTER :: X(:)                 | %mem = // … dummy argument, or local, or global <br> %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} :  fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>>                                  |
+| REAL, ALLOCATABLE :: X(:)                 | %mem = // … dummy argument, or local, or global <br> %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.alloc} :  fir.ref<fir.box<fir.heap<fir.array<?xf32>>>>                               |
+| CHARACTER(10) :: C                        | %mem = //  … dummy argument, or local, or global <br> %c = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEc"} :  fir.ref<fir.char<10>>                                                        |
+| CHARACTER(\*) :: C                        | %unbox = fir.unbox %bochar (fir.boxchar<1>) -> (fir.ref<fir.char<?>>, index) <br> %c = fir.declare %unbox#0 typeparams %unbox#1 {fir.def = "\_QPfooEc"} : (index) ->  fir.ref<fir.char<?>>  |
+| CHARACTER(\*), OPTIONAL, ALLOCATABLE :: C | %mem = // … dummy argument <br> %c = fir.declare %mem {fir.def = "\_QPfooEc", fir.alloc, fir.optional, fir.assumed\_len\_alloc} :  fir.ref<fir.box<fir.heap<fir.char<?>>>>                  |
+| TYPE(T) :: X                              | %mem = //  … dummy argument, or local, or global <br> %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<fir.type<t{...}>>                                                             |
+| TYPE(T(L)) :: X                           | %mem = //  … dummy argument, or local, or global <br> %lval = fir.load %l <br> %x = fir.declare %mem typeparams %lval {fir.def = "\_QPfooEx"} : fir.box<fir.type<t{...}>>                   |
+| CLASS(\*), POINTER :: X                   | %mem = //  … dummy argument, or local, or global <br> %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.class<fir.ptr<None>>                                                     |
+| REAL :: X(..)                             | %mem = //  … dummy argument <br> %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.box<fir.array<..xf32>>                                                                                 |
+
+#### hlfir.associate operation
+
+Motivation: represent Fortran associations (both from variables and expressions)
+and allow keeping actual/dummy argument association information after inlining.
+
+Syntax:
+```
+%var = hlfir.associate %expr_or_var {fir.def = mangled_uniq_name, attributes} (AnyExprOrVarType) -> AnyVarType
+```
+
+hlfir.associate is used to represent the following associations:
+- Dummy/Actual association on the caller side (the callee side uses
+  fir.declare).
+- Host association in block constructs when VOLATILE/ASYNC attributes are added
+  locally
+- ASSOCIATE construct (both from variable and expressions).
+
+When the operand is a variable, hlfir.associate allows changing the attributes
+of the variable locally, and to encode certain side-effects (like
+copy-in/copy-out when going from a non-contiguous variable to a contiguous
+variable, with the help of the related hlfir.end_association operation).
+
+When the operand is an expression, hlfir.associate allows associating a storage
+location to an expression value.
+
+A hlfir.associate must be followed by a related hlfir.end_association that will
+allow inserting any necessary finalization or copy-out later.
+
+#### hlfir.end_association operation
+
+Motivation: mark the place where some association should end and some side
+effects might need to occur.
+
+The hlfir.end_associate is a placeholder to later insert
+deallocation/finalization if the variable was associated with an expression,
+and to insert copy-out/deallocation if the variable was associated with another
+variable with a copy-in.
+
+Syntax:
+```
+hlfir.end_association %var [%original_variable] {fir.ref = var_mangled_name, attributes}
+```
+
+
+The attributes can be:
+-   copy_out (copy out the associated variable back into the original variable
+    if a copy-in occurred)
+-   finalize_copy_in (deallocate the temporary storage for the associated
+    variable if a copy-in occurred but the associated variable was not modified
+    (e.g., it is intent(in))).
+-   finalize: indicate that a finalizer should be run on the entity associated
+    with the variable (There is currently no way to deduce this only from the
+    variable type in FIR). It will give the finalizer mangled name so that it
+    can be later called.
+
+If the copy_out or finalize_copy_in attribute is set, “original_variable” (the
+argument of the hlfir.associate that produced %var) must be provided. The
+rationale is that the original variable address is needed to verify if a
+temporary was created, and if needed, to copy the data back to it.
+
+#### hlfir.finalize
+
+Motivation: mark end of life of local variables
+
+Mark the place where a local variable will go out of scope. The main goal is to
+retain this information even after local variables are inlined.
+
+Syntax:
+```
+hlfir.finalize %var {fir.ref = var_mangled_name, attributes}
+```
+
+The attributes can be:
+-   finalize: indicate that a finalizer should be run on the entity associated
+    with the variable (There is currently no way to deduce this only from the
+    variable type in FIR).
+
+Note that finalization will not free the local variable storage if it was
+allocated on the heap. If lowering created the storage passed to fir.declare via
+a fir.allocmem, lowering should insert a fir.freemem after the hlfir.finalize.
+This could help making fir.allocmem to fir.alloca promotion simpler, and also
+because finalization may be run without the intent to deallocate the variable
+storage (like on INTENT(OUT) dummies).
+
+
+#### hlfir.designate
+
+Motivation: Represent designators at a high-level and allow representing some
+information about derived type components that would otherwise be lost, like
+component lower bounds.
+
+Represent Fortran designators in a verbatim way: both triplet, and component
+parts.
+
+Syntax:
+```
+%var = hlfir.designate %base [“component”,] [(%i, %k:l%:%m)] [substr ub, lb] [imag|real] [shape extent1, extent2, ....] [lbs lb1, lb2, .....] [typeparams %l1, ...] {fir.ref = base_mangled_name, fir.def = mangled_name, attributes}
+```
+
+hlfir.designate is intended to encode a single part-ref (as defined by the
+fortran standard). That means that a(:)%x(i, j, k) must be split into two
+hlfir.designate: one for a(:), and one for x(i, j, k).  If the base is ranked,
+and the component is an array, the subscripts are mandatory and must not
+contain triplets. This ensures that the result of a fir.designator cannot be a
+"super-array".
+
+The subscripts passed to hlfir.designate must be based on the base lower bounds
+(one by default).
+
+A substring is built by providing the lower and upper character indices after
+`substr`. Implicit substring bounds must be made explicit by lowering.  It is
+not possible to provide substr if a component is already provided. Instead the
+related Fortran designator must be split into two fir.designator. This is
+because the component character length will be needed to compute the right
+stride, and it might be lost if not placed on the first designator typeparams.
+
+Real and Imaginary complex parts are represented by an optional imag or real
+tag. It can be added even if there is already a component.
+
+The shape, lower bound, and type parameter operands represent the output entity
+properties. The point of having those made explicit is to allow early folding
+and hoisting of array section shape and length parameters (which especially in
+FORALL contexts, can simplify later assignment temporary insertion a lot). Also,
+if lower bounds of a derived type component array could not be added here, they
+would be lost since they are not represented by other means in FIR (the fir.type
+does not include this information).
+
+hlfir.designate is not intended to describe vector subscripted variables.
+Instead, lowering will have to introduce loops to do element by element
+addressing. See the Examples section. This helps keeping hlfir.designate simple,
+and since the contexts where a vector subscripted entity is considered to be a
+variable (in the sense that it can be modified) are very limited, it seems
+reasonable to have lowering deal with this aspect. For instance, a vector
+subscripted entity cannot be passed as a variable, it cannot be a pointer
+assignment target, and when it appears as an associated entity in an ASSOCIATE,
+the related variable cannot be modified.
+
+#### hlfir.assign
+
+Motivation: represent assignment at a high-level (mainly a change for array and
+character assignment) so that optimization pass can clearly reason about it
+(value propagation, inserting temporary for right-hand side evaluation only when
+needed), and that lowering does not have to implement it all.
+
+Syntax:
+```
+hlfir.assign %expr_or_var to %var [attributes]
+```
+
+The attributes can be:
+
+-   realloc: mark that assignment has F2003 semantics and that the left-hand
+    side may have to be deallocated/reallocated…
+-   use_assign=@function: mark a user defined assignment
+-   no_overlap: mark that an assignment does not need a temporary (added by an
+    analysis pass).
+-   unordered : mark that an assignment can happen in any element order (not
+    true if there is an impure elemental function being called).
+
+This will replace the current array_load/array_access/array_merge semantics.
+Instead, a more generic alias analysis will be performed on the LHS and RHS to
+detect aliasing, and a temporary inserted if needed. The alias analysis will
+look at all the memory references in the RHS operand tree and base overlap
+decisions on the related variable declaration operations. This same analysis
+should later allow moving/merging some expression evaluation between 
diff erent
+statements.
+
+Note about user defined assignments: semantics is resolving them and building
+the related subroutine call. So a fir.call could directly be made in lowering if
+the right hand side was always evaluated in a temporary. The motivation to use
+hlfir.assign is to help the temporary removal, and also to deal with two edge
+cases: user assignment in a FORALL (the forall pass will need to understand that
+this an assignment), and allocatable assignment mixed with user assignment
+(implementing this as a call in lowering would require lowering the whole
+reallocation logic in lowering already, duplicating the fact that hlfir.assign
+should deal with it).
+
+#### hlfir.ptr_assign
+
+Motivation: represent pointer assignment without lowering the exact pointer
+implementation (descriptor address, fir.ref<fir.box> or simple pointer scalar
+fir.llvm_ptr<fir.ptr>).
+
+Syntax:
+```
+hlfir.ptr_assign %var [[reshape %reshape] | [lbounds %lb1, …., %lbn]] to %ptr
+```
+
+It is important to keep pointer assignment at a high-level so that they can
+later correctly be processed in hlfir.forall.
+
+#### hlfir.allocate
+
+Motivation: keep POINTER and ALLOCATABLE allocation explicit in HLFIR, while
+allowing later lowering to either inlined fir.allocmem or Fortran runtime
+calls. Generating runtime calls allow the runtime to do Fortran specific
+bookkeeping or flagging and to provide better runtime error reports.
+
+The main 
diff erence with the ALLOCATE statement is that one distinct
+hlfir.allocate has to be created for each element of the allocation-list.
+Otherwise, it is a naive lowering of the ALLOCATE statement.
+
+Syntax:
+```
+%stat = hlfir.allocate %var [%shape] [%type_params] [[src=%source] | [mold=%mold]] [errmsg =%errmsg]
+```
+
+#### hlfir.deallocate
+
+Motivation: keep deallocation explicit in HLFIR, while allowing later lowering
+to Fortran runtime calls to allow the runtime to do Fortran specific
+bookkeeping or flagging of allocations.
+
+Similarly to hlfir.allocate, one operation must be created for each
+allocate-object-list object.
+
+Syntax:
+```
+%stat = hlfir.deallocate %var [errmsg=err].
+```
+
+####  hlfir.elemental
+
+Motivation: represent elemental operations without defining array level
+operations for each of them, and allow the representation of array expressions
+as function of the indices.
+
+The hlfir.elemental operation can be seen as a closure: it is defining a
+function of the indices that returns the value of the element of the
+represented array expression at the given indices. This an operation with an
+MLIR region. It allows detailing how an elemental expression is implemented at
+the element level, without yet requiring materializing the operands and result
+in memory.  The hlfir.expr<T> elements value can be obtained using hlfir.apply.
+
+The element result is built with a fir.result op, whose result type can be a
+scalar hlfir.expr<T> or any scalar constant size types (e.g. i32, or f32).
+
+Syntax:
+```
+%op = hlfir.elemental (%indices) %shape [%type_params] [%dynamic_type] {
+  ….
+  fir.result %result_element
+}
+```
+
+
+Note that %indices are not operands, they are the elemental region block
+arguments, representing the array iteration space in a one based fashion.
+The choice of using one based indicies is to match Fortran default for
+array variables, so that there is no need to generate bound adjustments
+when working with one based array variables in an expression.
+
+Illustration: “A + B” represented with a hlfir.elemental.
+
+```
+%add = hlfir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !hlfir.expr<?x?xf32> {
+  %belt = hlfir.designate %b, %i, %j {fir.ref = _QPfooEb, fir.def = _QPfooEb.des001}: (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>
+  %celt = hlfir.designate %c, %i, %j {fir.ref = _QPfooEa, fir.def = _QPfooEa.des002} : (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>
+  %bval = fir.load %belt : (!fir.ref<f32>) -> f32
+  %cval = fir.load %celt : (!fir.ref<f32>) -> f32
+  %add = arith.addf %bval, %cval : f32
+  fir.result %res : f32
+}
+```
+
+In contexts where it can be proved that the array operands were not modified
+between the hlfir.elemental and the hlfir.apply, the region of the
+hlfir.elemental can be inlined at the hlfir.apply. Otherwise, if there is no
+such guarantee, or if the hlfir.elemental is not “visible” (because its result
+is passed as a block argument), the hlfir.elemental will be lowered to an array
+temporary. This will be done as a HLFIR to HLFIR optimization pass. Note that
+MLIR inlining could be used if hlfir.elemental implemented the
+CallableInterface and hlfir.apply the CallInterface.  But MLIR generic inlining
+is probably too generic for this case: no recursion is possible here, the call
+graphs are trivial, and using MLIR inlining here could introduce later
+conflicts or make normal function inlining more complex because FIR inlining
+hooks would already be used.
+
+hlfir.elemental allows delaying elemental array expression buffering and
+combination. Its generic aspect has two advantages:
+- It avoids defining one operation per elemental operation or intrinsic,
+  instead, the related arith dialect operations can be used directly in the
+  elemental regions. This avoids growing HLFIR and having to maintain about a
+  hundred operations.
+- It allows representing transformational intrinsics as functions of the indices
+  while doing optimization as described in
+  [Array Composition](ArrayComposition.md). This because the indices can be
+  transformed inside the region before being applied to array variables
+  according to any kind of transformation (semi-affine or not).
+
+
+#### Introducing the hlfir.apply operation
+
+Motivation: provide a way to get the element of an array expression
+(hlfir.expr<?x…xT>)
+
+This is the addressing equivalent for expressions. A notable 
diff erence is that
+it can only take simple scalar indices (no triplets) because it is not clear
+why supporting triplets would be needed, and keeping the indexing simple makes
+inlining of hlfir.elemental much easier.
+
+If hlfir.elemental inlining is not performed, or if the hlfir.expr<T> array
+expression is produced by another operation (like fir.intrinsic) that is not
+rewritten, hlfir.apply will be lowered to an actual addressing operation that
+will address the temporary that was created for the hlfir.expr<T> value that
+was materialized in memory.
+
+hlfir.apply indices will be one based to make further lowering simpler.
+
+Syntax:
+```
+%element = hlfir.apply %array_expr %i, %j: (hlfir.expr<?x?xi32>) -> i32
+```
+
+
+#### Introducing operations for transformational intrinsic functions
+
+Motivation: Represent transformational intrinsics functions at a high-level so
+that they can be manipulated easily by the optimizer, and do not require
+materializing the result as a temporary in lowering.
+
+An operation will be added for each Fortran transformational functions (SUM,
+MATMUL, TRANSPOSE....). It translates the Fortran expression verbatim: it takes
+the same number of arguments as the Fortran intrinsics and returns a
+hlfir.expr<T>. The arguments may be hlfir.expr<T>, simple scalar types (e.g.,
+i32, f32), or variables.
+
+The exception being that the arguments that are statically absent would be
+passed to it (passing results of fir.absent operation), so that the arguments
+can be identified via their positions.
+
+This operation is meant for the transformational intrinsics, not the elemental
+intrinsics, that will be implemented using hlfir.elemental + mlir math dialect
+operations, nor the intrinsic subroutines (like random_seed or system_clock),
+that will be directly lowered in lowering.
+
+Syntax:
+```
+%res = hlfir."intrinsic_name" %expr_or_var, ...
+```
+
+These operations will all inherit a same operation base in tablegen to make
+their definition and identification easy.
+
+Without any optimization, codegen would then translate the operations to
+exactly the same FIR as currently generated by IntrinsicCall.cpp (runtime calls
+or inlined code with temporary allocation for array results). The fact that
+they are the verbatim Fortran translations should allow to move the lowering
+code to a translation pass without massive changes.
+
+An operation will at least be created for each of the following transformational
+intrinsics: all, any, count, cshift, dot_product, eoshift, findloc, iall, iany,
+iparity, matmul, maxloc, maxval, minloc, minval, norm2, pack, parity, product,
+reduce, repeat, reshape, spread, sum, transfer, transpose, trim, unpack.
+
+For the following transformational intrinsics, the current lowering to runtime
+call will probably be used since there is little point to keep them high level:
+- command_argument_count, get_team, null, num_images, team_number, this_image
+  that are more program related (and cannot appear for instance in constant
+  expressions)
+- selected_char_kind, selected_int_kind, selected_real_kind that returns scalar
+  integers
+
+#### Introducing operations for character operations and elemental intrinsic functions
+
+
+Motivation: represent character operations without requiring the operand and
+results to be materialized in memory.
+
+fir.char_op is intended to represent:
+-  Character concatenation (//)
+-  Character MIN/MAX
+-  Character MERGE
+-  “SET_LENGTH”
+-  Character conversions
+-  REPEAT
+-  INDEX
+-  CHAR
+-  Character comparisons
+-  LEN_TRIM
+
+The arguments must be scalars, the elemental aspect should be handled by a
+hlfir.elemental operation.
+
+Syntax:
+```
+%res = hlfir.“char_op” %expr_or_var
+```
+
+Just like for the transformational intrinsics, if no optimization occurs, these
+operations will be lowered to memory operations with temporary results (if the
+result is a character), using the same generation code as the one currently used
+in lowering.
+
+#### hlfir.array_ctor
+
+Motivation: represent array constructor without creating temporary
+
+Many array constructors have a limited number of elements (less than 10), the
+current lowering of array constructor is rather complex because it must deal
+with the generic cases.
+
+Having a representation to represent array constructor will allow an easier
+lowering of array constructor, and make array ctor a lot easier to manipulate.
+For instance, for small array constructors, loops could could be unrolled with
+the array ctor elements without ever creating a dynamically allocated array
+temporary and loop nest using it.
+
+Syntax:
+```
+%array_ctor = hlfir.array_ctor %expr1, %expr2 ….
+```
+
+Note that hlfir.elemental could be used to implement some ac-implied-do,
+although this is not yet clarified since ac-implied-do may contain more than
+one scalar element (they may contain a list of scalar and array values, which
+would render the representation in a hlfir.elemental tricky, but maybe not
+impossible using if/then/else and hlfir.elemental nests using the index value).
+One big issue though is that hlfir.elemental requires the result shape to be
+pre-computed (it is an operand), and with an ac-implied-do containing user
+transformational calls returning allocatable or pointer arrays, it is
+impossible to pre-evaluate the shape without evaluating all the function calls
+entirely (and therefore all the array constructor elements).
+
+#### hlfir.get_extent
+
+Motivation: inquire about the extent of a hlfir.expr, variable, or fir.shape
+
+Syntax:
+```
+%extent = hlfir.get_extent %shape_expr_or_var, dim
+```
+
+dim is a constant integer attribute.
+
+This allows inquiring about the extents of expressions whose shape may not be
+yet computable without generating detailed, low level operations (e.g, for some
+transformational intrinsics), or to avoid going into low level details for
+pointer and allocatable variables (where the descriptor needs to be read and
+loaded).
+
+#### hlfir.get_typeparam
+
+Motivation: inquire about the type parameters of a hlfir.expr, or variable.
+
+Syntax:
+```
+%param = hlfir.get_typeparam %expr_or_var [, param_name]
+```
+- param_name is an optional string attribute that must contain the length
+  parameter name if %expr_or_var is a derived type.
+
+####  hlfir.get_dynamic_type
+
+Motivation: inquire about the dynamic type of a polymorphic hlfir.expr or
+variable.
+
+Syntax:
+```
+%dynamic_type = hlfir.get_dynamic_type %expr_or_var
+```
+
+#### hlfir.get_lbound
+
+Motivation: inquire about the lower bounds of variables without digging into
+the implementation details of pointers and allocatables.
+
+Syntax:
+```
+%lb = hlfir.get_lbound %var, n
+```
+
+Note: n is an integer constant attribute for the (zero based) dimension.
+
+####  hlfir.shape_meet
+
+Motivation: represent conformity requirement/information between two array
+operands so that later optimization can choose the best shape information
+source, or insert conformity runtime checks.
+
+Syntax:
+```
+%shape = hlfir.shape_meet %shape1, %shape2
+```
+
+Suppose A(n), B(m) are two explicit shape arrays. Currently, when A+B is
+lowered, lowering chose which operand shape gives the result shape information,
+and it is later not retrievable that both n and m can be used. If lowering
+chose n, but m later gets folded thanks to inlining or constant propagation, the
+optimization passes have no way to use this constant information to optimize the
+result storage allocation or vectorization of A+B.  hlfir.shape_meet intends to
+delay this choice until constant propagation or inlining can provide better
+information about n and m.
+
+#### hlfir.forall
+
+Motivation: segregate the Forall lowering complexity in its own unit.
+
+Forall is tough to lower because:
+-   Lowering it in an optimal way requires analyzing several assignments/mask
+    expressions.
+-   The shape of the temporary needed to store intermediate evaluation values is
+    not a Fortran array in the general case, and cannot in the general case be
+    maximized/pre-computed without executing the forall to compute the bounds of
+    inner forall, and the shape of the assignment operands that may depend on
+    the bound values.
+-   Mask expressions evaluation should be affected by previous assignment
+    statements, but not by the following ones. Array temporaries may be
+    required for the masks to cover this.
+-   On top of the above points, Forall can contain user assignments, pointer
+    assignments, and assignment to whole allocatable.
+
+
+The hlfir.forall syntax would be exactly the one of a fir.do_loop. The
+
diff erence would be that hlfir.assign and hlfir.ptr_assign inside hlfir.forall
+have specific semantics (the same as in Fortran):
+-   Given one hlfir.assign, all the iteration values of the LHS/RHS must be
+    evaluated before the assignment of any value is done.
+-   Given two hlfir.assign, the first hlfir.assign must be fully performed
+    before any evaluation of the operands of the second assignment is done.
+-   Masks (fir.if arguments), if any, should be evaluated before any nested
+    assignments. Any assignments syntactically before the where mask occurrence
+    must be performed before the mask evaluation.
+
+Note that forall forbids impure function calls, hence, no calls should modify
+any other expression evaluation and can be removed if unused.
+
+The translation of hlfir.forall will happen by:
+-   1. Determining if the where masks value may be modified by any assignments
+    - Yes, pre-compute all masks in a pre-run of the forall loop, creating
+      a “forall temps” (we may need a FIR concept to help here).
+    - No, Do nothing (or indicate it is safe to evaluate masks while evaluating
+      the rest).
+-   2. Determining if a hlfir.assign operand expression depends on the
+       previous hlfir.assign left-hand side base value.
+    - Yes, split the hlfir.assign into their own nest of hlfir.forall loops.
+    - No, do nothing (or indicate it is safe to evaluate the assignment while
+      evaluating previous assignments)
+-   3. For each assignments, check if the RHS/LHS operands value may depend
+     on the LHS base:
+    - Yes, split the forall loops. Insert a “forall temps” before the loops for
+      the “smallest” part that may overlap (which may be the whole RHS, or some
+      RHS sub-part, or some LHS indices). In the first nest, evaluate this
+      overlapping part into the temp. In the next forall loop nest, modify the
+      assignment to use the temporary, and add the [no_overlap] flag to indicate
+      no further temporary is needed. Insert code to finalize the temp after its
+      usage.
+
+### Tagging variable uses in high-level operations (fir.ref attribute)
+
+All operations defined above that accept "variables" (i.e: memory addresses or
+box values that were produced by fir.declare, hlfir.associate, or
+hlfir.designate) must have a fir.ref = mangled_name_attribute that matches the
+fir.def on the operation that created them (it will be added automatically by
+the operation builder). That is to ensure optimization passes do not merge
+seemingly identical operations using variables with 
diff erent properties, and
+also to ensure that the matching defining operation can always be retrieved to
+get all the variable properties (shape, bounds, type parameters and
+attributes).
+
+Two other alternatives have been considered and rejected:
+- Using MLIR symbols. This has been rejected because MLIR symbols are mainly
+  intended to deal with globals and functions that may refer to each other
+  before being defined. Their processing is not as light as normal values, and
+  would require to turn every FIR operation with a region into an MLIR symbol
+  table. This would especially be annoying given fir.designator also produce
+  variables with their own properties, which would imply creating a lot of MLIR
+  symbols. All the operations that both accept variable and expression operands
+  would also either need to be more complex in order to both accept SSA values
+  or MLIR symbol operands (or some fir.as_expr %var operation should be added to
+  turn a variable into an expression). Given all variable definitions will
+  dominates their uses, it seems more adequate to use an SSA model with named
+  attributes. Using SSA values also makes the transition and mix with
+  lower-level FIR operations smoother: a variable SSA usage can simply be
+  replaced by lower-level FIR operations using the same SSA value.
+- Another alternative could be making all operations defining variables return
+  fir.box, and repeating the variable attributes (fir.target...) on all
+  operations using the variable. This would allow the link between the variable
+  definition and usage to become broken (variable could travel as block
+  arguments). But this would risk littering the codegen with fir.box
+  manipulations (creating, writing and reading to descriptors) that may lead to
+  poor performance. Maintaining all the attributes on the operations would also
+  be more cumbersome than only maintaining the variable name in the fir.ref
+  attribute.
+
+Lower-level operations (the current FIR operations), do not require this strong
+link between a memory address and the variable definition, and it will not be
+necessary to add fir.ref attributes to those. During alias analysis on FIR using
+lower-level operations (like loads and stores), any memory reference that cannot
+be resolved to a Fortran variable or some unrelated temporary allocation is
+considered as potentially overlapping.
+
+The variable definition will be guaranteed to have a unique name after lowering,
+and some care might have to be taken when later duplicating regions that define
+variables in a way that could lead a variable usage to have two dominating
+definitions with the same name (this could for instance happen after inlining
+two calls to the same procedure inside the same region). Inlining will need to
+take care of those conflicts. This could be done by randomizing the inlined
+variable name attributes (like by adding a counter index that is incremented
+after each call inlining).
+
+## New HLFIR Transformation Passes
+
+### Mandatory Passes (translation towards lower-level representation)
+
+Note that these passes could be implemented as a single MLIR pass, or successive
+passes.
+
+-   Forall rewrites (getting rid of hlfir.forall)
+-   Array assignment rewrites (getting rid of array hlfir.assign)
+-   Bufferization: expression temporary materialization (getting rid of
+    hlfir.expr, and all the operations that may produce it like transformational
+    intrinsics and hlfir.elemental, hlfir.apply).
+-   Call interface argument association lowering (getting rid of hlfir.associate
+    and hlfir.end_associate)
+-   Lowering high level operations using variables into FIR operations
+    operating on memory (translating hlfir.designate, scalar hlfir.assign,
+    hlfir.finalize into fir.array_coor, fir.do_loop, fir.store, fir.load.
+    fir.embox/fir.rebox operations).
+
+Note that these passes do not have to be the first one run after lowering. It is
+intended that CSE, DCE, algebraic simplification, inlining and some other new
+high-level optimization passes discused below be run before doing any of these
+translations.
+
+After that, the current FIR pipeline could be used to continue lowering towards
+LLVM.
+
+### Optimization Passes
+
+-   Elemental expression inlining (inlining of hlfir.elemental in hlfir.apply)
+-   User function Inlining
+-   Transformational intrinsic rewrites as hlfir.elemental expressions
+-   Assignments propagation
+-   Shape/Rank/dynamic type propagation
+
+These high level optimization passes can be run any number of times in any
+order.
+
+## Transition Plan
+
+The new higher-level steps proposed in this document will require significant
+refactoring of lowering. Codegen should not be impacted since the current FIR
+will remain untouched.
+
+A lot of the code in lowering generating Fortran features (like an intrinsic or
+how to do assignments) is based on the fir::ExtendedValue concept. This
+currently is a collection of mlir::Value that allows describing a Fortran object
+(either a variable or an evaluated expression result). The variable and
+expression concepts described above should allow to keep an interface very
+similar to the fir::ExtendedValue, but having the fir::ExtendedValue wrap a
+single value or mlir::Operation* from which all of the object entity
+information can be inferred.
+
+That way, all the helpers currently generating FIR from fir::ExtendedValue could
+be kept and used with the new variable and expression concepts with as little
+modification as possible.
+
+The proposed plan is to:
+- 1. Introduce the new HLFIR operations.
+- 2. Refactor fir::ExtendedValue so that it can work with the new variable and
+     expression concepts (requires part of 1.).
+- 3. Introduce the new translation passes, using the fir::ExtendedValue helpers
+     (requires 1.).
+- 3.b Introduce the new optimization passes (requires 1.).
+- 4. Introduce the fir.declare and hlfir.finalize usage in lowering (requires 1.
+     and 2. and part of 3.).
+
+The following steps might have to be done in parallel of the current lowering,
+to avoid disturbing the work on performance until the new lowering is complete
+and on par.
+
+- 5. Introduce hlfir.designate and hlfir.associate usage in lowering.
+- 6. Introduce lowering to hlfir.assign (with RHS that is not a hlfir.expr),
+     hlfir.ptr_assign.
+- 7. Introduce lowering to hlfir.expr and related operations.
+- 8. Introduce lowering to hlfir.forall.
+
+At that point, lowering using the high-level FIR should be in place, allowing
+extensive testing.
+- 9. Debugging correctness.
+- 10. Debugging execution performance.
+
+The plan is to do these steps incrementally upstream, but for lowering this will
+most likely be safer to do have the new expression lowering implemented in
+parallel upstream, and to add an option to use the new lowering rather than to
+directly modify the current expression lowering and have it step by step
+equivalent functionally and performance wise.
+
+## Examples
+
+### Example 1: simple array assignment
+
+```Fortran
+subroutine foo(a, b)
+  real :: a(:), b(:)
+  a = b
+end subroutine
+```
+
+Lowering output:
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>
+  %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  hlfir.assign %b to %a {fir.ref = "_QPfooEb,_QPfooEa"}: !fir.box<!fir.array<?xf32>>
+  return
+}
+```
+
+HLFIR array assignment lowering pass:
+-   Query: can %b value depend on %a? No, they are two 
diff erent argument
+    associated variables that are neither target nor pointers.
+-   Lower to assignment to loop:
+
+```HFLIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>
+  %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+
+  %ashape = hlfir.shape_of %a {fir.ref = "_QPfooEa"}
+  %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}
+  %shape = hlfir.shape_meet %ashape, %bshape
+  %extent = hlfir.get_extent %shape, 0
+
+  %c1 = arith.constant 1 : index
+
+  fir.do_loop %i = %c1 to %extent step %c1 unordered {
+    %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", "fir.def=_QPfooEb.des001"}
+    %aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", "fir.def=_QPfooEa.des002"}
+    hlfir.assign %belt to %aelt {fir.ref = "_QPfooEb.des001,_QPfooEa.des002"}: fir.ref<f32>, fir.ref<f32>
+  }
+  return
+}
+```
+
+HLFIR variable operations to memory translation pass:
+-   hlfir.designate is rewritten into fir.array_coor operation on the variable
+    associated memory buffer, and returns the element address
+-   For numerical scalar, hlfir.assign is rewritten to fir.store (and fir.load
+    of the operand if needed), for derived type and characters, memory copy
+    (and padding for characters) is done.
+-   hlfir.shape_of are lowered to fir.box_dims, here, no constant information
+    was obtained from any of the source shape, so hlfir.shape_meet is a no-op,
+    selecting the first shape (a conformity runtime check could be inserted
+    under debug options).
+-   fir.declare are kept (they are no-ops) so that it will be possible to
+    generate debug information for LLVM.
+
+This pass would wrap operations defining variables (fir.declare/hlfir.designate)
+as fir::ExtendedValue, and use all the current helpers operating on it
+(e.g.: fir::factory::genScalarAssignment).
+
+```
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1:
+  !fir.box<!fir.array<?xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>
+  %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  %c1 = arith.constant 1 : index
+  %dims = fir.box_dims %a, 1
+  fir.do_loop %i = %c1 to %dims#1 step %c1 unordered {
+    %belt = fir.array_coor %b, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>
+    %aelt = fir.array_coor %a, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>
+    %bval = fir.load %belt : f32
+    fir.store %bval to %aelt : fir.ref<f32>
+  }
+  return
+}
+```
+
+This reaches the current FIR level (except fir.declare_op that can be kept until
+LLVM codegen and dropped on the floor if there is no debug information
+generated).
+
+### Example 2: array assignment with elemental expression
+
+```Fortran
+subroutine foo(a, b, p, c)
+  real, target :: a(:)
+  real :: b(:), c(100)
+  real, pointer :: p(:)
+  a = b*p + c
+end subroutine
+```
+
+Lowering output:
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>
+  %b =  fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>
+  %c =  fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>
+  %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}
+  %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}
+  %shape1 = hlfir.shape_meet %bshape, %pshape
+  %mul = hlfir.elemental(%i:index) %shape1 {
+    %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}
+    %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}
+    %i_zero = arith.subi %i, %c1
+    %i_p = arith.addi %i_zero,  %p_lb
+    %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}
+    %bval = fir.load %belt : f32
+    %pval = fir.load %pelt : f32
+    %mulres = arith.mulf %bval, %pval : f32
+     fir.result %mulres : f32
+  }
+  %cshape = hlfir.shape_of %c
+  %shape2 = hlfir.shape_meet %cshape, %shape1
+  %add =  hlfir.elemental(%i:index) %shape2 {
+    %mulval = hlfir.apply %mul, %i : f32
+    %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}
+    %cval = fir.load %celt
+    %add_res = arith.addf %mulval, %cval
+    fir.result %add_res
+  }
+  hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
+  return
+}
+```
+
+Step 1: hlfir.elemental inlining: inline the first hlfir.elemental into the
+second one at the hlfir.apply.
+
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>
+  %b =  fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  %p = fir.declare %arg2 {fir.def = "_QPfooEa", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>
+  %c =  fir.declare %arg3 {fir.def = "_QPfooEp"} : !fir.ref<!fir.array<100xf32>>
+  %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}
+  %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}
+  %shape1 = hlfir.shape_meet %bshape, %pshape
+  %cshape = hlfir.shape_of %c
+  %shape2 = hlfir.shape_meet %cshape, %shape1
+  %add =  hlfir.elemental(%i:index) %shape2 {
+    %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}
+    %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}
+    %i_zero = arith.subi %i, %c1
+    %i_p = arith.addi %i_zero,  %p_lb
+    %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}
+    %bval = fir.load %belt : f32
+    %pval = fir.load %pelt : f32
+    %mulval = arith.mulf %bval, %pval : f32
+    %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}
+    %cval = fir.load %celt
+    %add_res = arith.addf %mulval, %cval
+    fir.result %add_res
+  }
+  hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
+  return
+}
+```
+
+Step2: alias analysis around the array assignment:
+
+-   May %add value depend on %a variable?
+-   Gather variable and function calls in %add operand tree (visiting
+    hlfir.elemental regions)
+-   Gather references to %b, %p, and %c. %p is a pointer variable according to
+    its defining operations. It may alias with %a that is a target. -> answer
+    yes.
+-   Insert temporary, and duplicate array assignments, that can be lowered to
+    loops at that point
+
+Note that the alias analysis could have already occurred without inlining the
+%add hlfir.elemental.
+
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>
+  %b =  fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>
+  %c =  fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>
+  %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}
+  %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}
+  %shape1 = hlfir.shape_meet %bshape, %pshape
+  %cshape = hlfir.shape_of %c
+  %shape2 = hlfir.shape_meet %cshape, %shape1
+  %add =  hlfir.elemental(%i:index) %shape2 {
+    %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}
+    %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}
+    %i_zero = arith.subi %i, %c1
+    %i_p = arith.addi %i_zero,  %p_lb
+    %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}
+    %bval = fir.load %belt : f32
+    %pval = fir.load %pelt : f32
+    %mulval = arith.mulf %bval, %pval : f32
+    %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}
+    %cval = fir.load %celt
+    %add_res = arith.addf %mulval, %cval
+    fir.result %add_res
+  }
+  %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index
+  %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>
+  %temp = fir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.heap<fir.array<?xf32>>
+  hlfir.assign %add to %temp : no_overlap {fir.ref = "QPfoo.temp001"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
+  hlfir.assign %temp to %a : no_overlap {fir.ref = " QPfoo.temp001,_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
+  hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}
+  fir.freemem %tempstorage
+  return
+}
+```
+
+Step 4: Lower assignments to regular loops since they have the no_overlap
+attribute, and inline the hlfir.elemental into the first loop nest.
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>
+  %b =  fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
+  %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>
+  %c =  fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>
+  %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}
+  %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}
+  %shape1 = hlfir.shape_meet %bshape, %pshape
+  %cshape = hlfir.shape_of %c
+  %shape2 = hlfir.shape_meet %cshape, %shape1
+  %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index
+  %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>
+  %temp = fir.declare %tempstorage, shape %extent (index) fir.def = QPfoo.temp001} : fir.heap<fir.array<?xf32>>
+  fir.do_loop %i = %c1 to %shape2 step %c1 unordered {
+    %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}
+    %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}
+    %i_zero = arith.subi %i, %c1
+    %i_p = arith.addi %i_zero,  %p_lb
+    %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}
+    %bval = fir.load %belt : f32
+    %pval = fir.load %pelt : f32
+    %mulval = arith.mulf %bval, %pval : f32
+    %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}
+    %cval = fir.load %celt
+    %add_res = arith.addf %mulval, %cval
+    %tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des004"}
+    hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des004"}: f32, fir.ref<f32>
+  }
+  fir.do_loop %i = %c1 to %shape2 step %c1 unordered {
+    %aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", fir.def= "_QPfooEa.des005"}
+    %tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des006"}
+    hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des005,_QPfooEa.des005"}: f32, fir.ref<f32>
+  }
+  hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}
+  fir.freemem %tempstorage
+  return
+}
+```
+
+Step 5 (may also occur earlier or several times): shape propagation.
+-   %shape2 can be inferred from %cshape that has constant shape: the
+    hlfir.shape_meet results can be replaced by it, and if the option is set,
+    conformance checks can be added for %a, %b and %p.
+-   %temp is small, and its fir.allocmem can be promoted to a stack allocation
+
+```HLFIR
+func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
+  // .....
+  %cshape = fir.shape %c100
+  %extent = %c100
+  // updated fir.alloca
+  %tempstorage = fir.alloca %extent : fir.ref<fir.array<100xf32>>
+  %temp = fir.declare %tempstorage {fir.def = "_QPfoo.temp001"} : fir.ref<fir.array<100xf32>>
+  fir.do_loop %i = %c1 to %c100 step %c1 unordered {
+    // ...
+  }
+  fir.do_loop %i = %c1 to %c100 step %c1 unordered {
+    // ...
+  }
+  hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}
+  // deleted fir.freemem %tempstorage
+  return
+}
+```
+
+Step 6: lower hlfir.designate/hlfir.assign in a translation pass:
+
+At this point, the representation is similar to the current representation after
+the array value copy pass, and the existing FIR flow is used (lowering
+fir.do_loop to cfg and doing codegen to LLVM).
+
+### Example 3: assignments with vector subscript
+
+```Fortran
+subroutine foo(a, b, v)
+  real :: a(*), b(*)
+  integer :: v(:)
+  a(v) = b(v)
+end subroutine
+```
+
+Lowering of vector subscripted entities would happen as follow:
+- vector subscripted entities would be lowered as a hlfir.elemental implementing
+  the vector subscript addressing.
+- If the vector appears in a context where it can be modified (which can only
+  be an assignment LHS, or in input IO), lowering could transform the
+  hlfir.elemental into hlfir.forall (for assignments), or a fir.iter_while (for
+  input IO) by inlining the elemental body into the created loops, and
+  identifying the hlfir.designate producing the result.
+
+```HFLFIR
+func.func @_QPfoo(%arg0: !fir.ref<!fir.array<?xf32>>, %arg1: !fir.ref<!fir.array<?xf32>>, %arg2: !fir.box<<!fir.array<?xi32>>) {
+  %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.ref<!fir.array<?xf32>>
+  %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.ref<!fir.array<?xf32>>
+  %v = fir.declare %arg2 {fir.def = "_QPfooEv"} : !fir.box<!fir.array<?xi32>>
+  %vshape = hlfir.shape_of %v : fir.shape<1>
+  %bsection =  hlfir.elemental(%i:index) %vshape : (fir.shape<1>) -> hlfir.expr<?xf32> {
+    %v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des001"} : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>
+    %v_val = fir.load %v_elt : fir.ref<i32>
+    %cast = fir.convert %v_val : (i32) -> index
+    %b_elt = hlfir.designate %b, %v_val {fir.ref = "_QPfooEb", fir.def="_QPfooEb.des002"} : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>
+    %b_val = fir.load %b_elt : fir.ref<f32>
+    fir.result %b_elt
+  }
+  %extent = hlfir.get_extent %vshape, 0 : (fir.shape<1>) -> index
+  %c1 = arith.constant 1 : index
+  hlfir.forall (%i from %c1 to %extent step %c1) {
+    %b_section_val = hlfir.apply %bsection, %i : (hlfir.expr<?xf32>, index) -> f32
+    %v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des003"} : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>
+    %v_val = fir.load %v_elt : fir.ref<i32>
+    %cast = fir.convert %v_val : (i32) -> index
+    %a_elt = hlfir.designate %a, %v_val {fir.ref = "_QPfooEa", fir.def="_QPfooEa.des004"} : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>
+    hlfir.assign %b_section_val to %a_elt {fir.ref="_QPfooEa.des004"} : f32, fir.ref<f32>
+  }
+  return
+}
+```
+
+This would then be lowered as described in the examples above (hlfir.elemental
+will be inlined, hlfir.forall will be rewritten into normal loops taking into
+account the alias analysis, and hlfir.assign/hlfir.designate operations will be
+lowered to fir.array_coor and fir.store operations).
+
+# Alternatives that were not retained
+
+## Using a non-MLIR based mutable CFG representation
+
+An option would have been to extend the PFT to describe expressions in a way
+that can be annotated and modified with the ability to introduce temporaries.
+This has been rejected because this would imply a whole new set of
+infrastructure and data structures while FIR is already using MLIR
+infrastructure, so enriching FIR seems a smoother approach and will benefit from
+the MLIR infrastructure experience that was gained.
+
+## Using some existing MLIR dialects for the high-level Fortran.
+
+### Why not using Linalg dialect?
+
+The linalg dialects offers a powerful way to represent array operations: the
+linalg.generic operation takes a set of input and output arrays, a related set
+of affine maps to represent how these inputs/outputs are to be addressed, and a
+region detailing what operation should happen at each iteration point, given the
+input and output array elements. It seems mainly intended to optimize matmul,
+dot, and sum.
+
+Issues:
+
+-   The linalg dialect is tightly linked to the tensor/memref concepts that
+    cannot represent byte stride based discontinuity and would most likely
+    require FIR to use MLIR memref descriptor format to take advantage of it.
+-   It is not clear whether all Fortran array expression addressing can be
+    represented as semi affine maps. For instance, vector subscripted entities
+    can probably not, which may force creating temporaries for the related
+    designator expressions to fit in this framework. Fortran has a lot more
+    transformational intrinsics than matmul, dot, and sum that can and should
+    still be optimized.
+
+So while there may be benefits to use linalg at the optimization level (like
+rewriting fir.sum/fir.matmul to a linalg sum, with dialect types plumbing
+around the operand and results, to get tiling done by linalg), using it as a
+lowering target would not cover all Fortran needs (especially for the non
+semi-affine cases).
+So using linalg is for now left as an optimization pass opportunity in some
+cases that could be experimented.
+
+### Why not using Shape dialect?
+
+MLIR shape dialect gives a set of operations to manipulate shapes. The
+shape.meet operation is exactly similar with hlfir.shape_meet, except that it
+returns a tensor or a shape.shape.
+
+The main issue with using the shape dialect is that it is dependent on tensors.
+Bringing the tensor toolchain in flang for the sole purpose of manipulating
+shape is not seen as beneficial given that the only thing Fortran needs is
+shape.meet The shape dialect is a lot more complex because it is intended to
+deal with computations involving dynamically ranked entity, which is not the
+case in Fortran (assumed rank usage in Fortran is greatly limited).
+
+## Using embox/rebox and box as an alternative to fir.declare/hlfir.designate and hlfir.expr/ variable concept
+
+All Fortran entities (*) can be described at runtime by a fir.box, except for
+some attributes that are not part of the runtime descriptors (like TARGET,
+OPTIONAL or VOLATILE).  In that sense, it would be possible to have
+fir.declare, hlfir.designate, and hlfir.associate be replaced by embox/rebox,
+and also to have all operation creating hlfir.expr to create fir.box.
+
+This was rejected because this would lack clarity, and make embox/rebox
+semantics way too complex (their codegen is already non-trivial), and also
+because it would then not really be possible to know if a fir.box is an
+expression or a variable when it is an operand, which would make reasoning
+harder: this would already imply that expressions have been buffered, and it is
+not clear when looking at a fir.box if the value it describe may change or not,
+while a hlfir.expr value cannot change, which allows moving its usages more
+easily.
+
+This would also risk generating too many runtime descriptors read and writes
+that could make later optimizations harder.
+
+Hence, while this would be functionally possible, this makes the reasoning about
+the IR harder and would not benefit high-level optimizations.
+
+(*) This not true for vector subscripted variables, but the proposed plan will
+also not allow creating vector subscripted variables as the result of a
+hlfir.designate. Lowering will deal with the assignment and input IO special
+case using hlfir.elemental.