[flang-commits] [PATCH] D134285: [flang][RFC] Adding higher level FIR ops to ease expression lowering

Thu Sep 22 04:26:41 PDT 2022

peixin added inline comments.

================
Comment at: flang/docs/HighLevelFIR.md:169
+
+Motivation: avoid the need to materialize expressions in temporaries while
+lowering.
----------------
peixin wrote:
> kiranchandramohan wrote:
> > jeanPerier wrote:
> > > peixin wrote:
> > > > kiranchandramohan wrote:
> > > > > jeanPerier wrote:
> > > > > > peixin wrote:
> > > > > > > Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using `fir.expr`? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:
> > > > > > > ```
> > > > > > > +, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
> > > > > > > ```
> > > > > > Interesting, what are you currently pattern matching ?
> > > > > > I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.
> > > > > > 
> > > > > > As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators  and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.
> > > > > > 
> > > > > > Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.
> > > > > Thanks @peixin for raising this point.
> > > > > 
> > > > > Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an `omp.reduction ` operation.
> > > > > 
> > > > > So for a reduction happening on an addition (say `b = b + a`).
> > > > > We pattern match on the following IR,
> > > > > ```
> > > > > %b = fir.load %bref
> > > > > %a = fir.load %aref
> > > > > %r = arith.add %a, %b
> > > > > fir.store %r
> > > > > ```
> > > > > and get the following result IR.
> > > > > ```
> > > > > %a = fir.load %aref
> > > > > omp.reduction %a, %bref
> > > > > ```
> > > > > 
> > > > > An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.
> > > > Thanks @kiranchandramohan for answer this question.
> > > > 
> > > > For "redefined", what I mean is that the addition statement `b = b + a` is not lowered as `%b_new_val = arith.add %b_val, %a_val` any more. Instead, it will be lowered as `omp.reduction %a_val, %b_red`.
> > > > 
> > > > The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering  to be open to be changed. 
> > > > 
> > > > Beyond this, there are two scenarios having the similar problem:
> > > > 
> > > > 1. Complex multiplication and complex conjugate multiplication
> > > > 
> > > > The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > > 
> > > > 2. target/chip dependent transformation
> > > > 
> > > > One example is as follows:
> > > > subroutine sub(a, b, c, m, n, k)
> > > >   real :: a(m, k), b(k, n), c(m, n)
> > > >   !DIR ARM-SME (scalar matrix extension)
> > > >   c = a * b
> > > > end
> > > > 
> > > > With the compiler directive, the statement `c = a * b` can be lowered using some special transformation.
> > > > 
> > > > All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.
> > > > for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > 
> > > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. 
> > > 
> > > > For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.
> > > 
> > > Are you doing this only when b is a scalar or also when b is an array ?
> > > 
> > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> > > 
> > > > With the compiler directive, the statement c = a * b can be lowered using some special transformation.
> > > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.
> > > 
> > > I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> > > 
> > > 
> > ```Are you doing this only when b is a scalar or also when b is an array ?
> > 
> > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?```
> > 
> > Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate `omp.reduction` operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.
> > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.
> 
> For the following case:
> ```
> subroutine sub
>   complex :: a, b, c
>   c = conj(a) * b
> end
> ```
> With these high level FIR Ops, I hope to generate the following FIR:
> ```
> %a = fir.declare ...
> %b = fir.declare ...
> %c = complex.conjmul %a, %b : fir.complex<4>
> ```
> Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support. 
> 
> > Are you doing this only when b is a scalar or also when b is an array ?
> 
> It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.
> 
> > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> 
> It generates FIR, and match the pattern, then replace the generated FIR with `omp.reduction`. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better.
> 
> Here is one example (from the OpenMP Spec attached examples):
> ```
> SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
>  REAL :: X(*), A, D
>  INTEGER :: Y(*), N, B, C
>  INTEGER :: I
>  A = 0
>  B = 0
>  C = Y(1)
>  D = X(1)
>  !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
>  !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
>  DO I=1,N
>  A = A + X(I)
>  B = IEOR(B, Y(I))
>  C = MIN(C, Y(I))
>  IF (D < X(I)) D = X(I)
>  END DO
> 
> END SUBROUTINE REDUCTION1
> ```
> I think that the pattern match is hard to capture the `max reduction of D`. From the `evaluate::Expr`, we only need to check if the `GetSymbol(lhs)` is the reduction list item. We can emit the assertion if the `rhs` is not reduction identifier such as `+, IEOR, MIN` here. For the `if` statement, it is hard to give an reasonable assertion.
> 
> > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> 
> Thanks for the confirmation.
Agree.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134285/new/

https://reviews.llvm.org/D134285