[flang-commits] [PATCH] D134285: [flang][RFC] Adding higher level FIR ops to ease expression lowering

Thu Sep 22 02:25:36 PDT 2022

jeanPerier marked an inline comment as done.
jeanPerier added inline comments.

================
Comment at: flang/docs/HighLevelFIR.md:21
+MLIR translation passes.  As a result of these additions, it is likely that the
+fir.array_load/fir.array_merge_store and related array operations could be
+removed from FIR since array assignment analysis could directly happen on the
----------------
kiranchandramohan wrote:
> And the `ArrayValueCopy` pass will not be needed?
Right, it will be refactored into the (array) fir.assign and fir.forall translation passes.

================
Comment at: flang/docs/HighLevelFIR.md:83
+
+The fir.declare operation will allow:
+- Pushing higher-level Fortran concepts into FIR operations (like array
----------------
kiranchandramohan wrote:
> From the OpenMP side, in the long-term we would like to move privatisation into the OpenMP dialect. So it will be very helpful if variable declarations are available in FIR. On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.
> 
> Can you consider adding the `finalizer` also to the declare operation? 
> On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.

With the proposed strategy, the variables in FIR and high level FIR would be the same from an SSA type point of view (memory references and boxes like currently). The difference would be that all the shape/type parameters information related to that address would be guaranteed to be retrievable in one case, and not the other.

Given you will probably need this information to privatize the variables, you could either decide to make the privatization use the high level concept, but you could also directly use a "lower" level operation where all the side information (bounds, type params...) required for privatization are operands of the privatization operation.

Basically, you can freely use a "variable" SSA value produced in a fir.declare in "lower" level operations that do not need to access the info in the fir.declare. It is perfectly fine to have the current FIR operations mixed with the new higher level operations.

> Can you consider adding the finalizer also to the declare operation?

Why not, you would need it to know how to finalize OpenMP private copies after privatizations ?

Note that the runtime knows how to retrieve finalizer information from the descriptors, so calling Finalize runtime routine on privatized variables should also be sufficient (in which case you only need to know if a types needs to be finalized, not exactly how).

================
Comment at: flang/docs/HighLevelFIR.md:169
+
+Motivation: avoid the need to materialize expressions in temporaries while
+lowering.
----------------
peixin wrote:
> kiranchandramohan wrote:
> > jeanPerier wrote:
> > > peixin wrote:
> > > > Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using `fir.expr`? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:
> > > > ```
> > > > +, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
> > > > ```
> > > Interesting, what are you currently pattern matching ?
> > > I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.
> > > 
> > > As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators  and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.
> > > 
> > > Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.
> > Thanks @peixin for raising this point.
> > 
> > Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an `omp.reduction ` operation.
> > 
> > So for a reduction happening on an addition (say `b = b + a`).
> > We pattern match on the following IR,
> > ```
> > %b = fir.load %bref
> > %a = fir.load %aref
> > %r = arith.add %a, %b
> > fir.store %r
> > ```
> > and get the following result IR.
> > ```
> > %a = fir.load %aref
> > omp.reduction %a, %bref
> > ```
> > 
> > An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.
> Thanks @kiranchandramohan for answer this question.
> 
> For "redefined", what I mean is that the addition statement `b = b + a` is not lowered as `%b_new_val = arith.add %b_val, %a_val` any more. Instead, it will be lowered as `omp.reduction %a_val, %b_red`.
> 
> The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering  to be open to be changed. 
> 
> Beyond this, there are two scenarios having the similar problem:
> 
> 1. Complex multiplication and complex conjugate multiplication
> 
> The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> 
> 2. target/chip dependent transformation
> 
> One example is as follows:
> subroutine sub(a, b, c, m, n, k)
>   real :: a(m, k), b(k, n), c(m, n)
>   !DIR ARM-SME (scalar matrix extension)
>   c = a * b
> end
> 
> With the compiler directive, the statement `c = a * b` can be lowered using some special transformation.
> 
> All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.
> for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.

I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. 

> For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.

Are you doing this only when b is a scalar or also when b is an array ?

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

> With the compiler directive, the statement c = a * b can be lowered using some special transformation.
That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.

I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134285/new/

https://reviews.llvm.org/D134285