[flang-commits] [PATCH] D134285: [flang][RFC] Adding higher level FIR ops to ease expression lowering

Fri Sep 23 02:09:57 PDT 2022

jeanPerier marked an inline comment as done.
jeanPerier added inline comments.

================
Comment at: flang/docs/HighLevelFIR.md:169
+
+Motivation: avoid the need to materialize expressions in temporaries while
+lowering.
----------------
peixin wrote:
> peixin wrote:
> > kiranchandramohan wrote:
> > > jeanPerier wrote:
> > > > peixin wrote:
> > > > > kiranchandramohan wrote:
> > > > > > jeanPerier wrote:
> > > > > > > peixin wrote:
> > > > > > > > Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using `fir.expr`? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:
> > > > > > > > ```
> > > > > > > > +, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
> > > > > > > > ```
> > > > > > > Interesting, what are you currently pattern matching ?
> > > > > > > I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.
> > > > > > > 
> > > > > > > As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators  and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.
> > > > > > > 
> > > > > > > Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.
> > > > > > Thanks @peixin for raising this point.
> > > > > > 
> > > > > > Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an `omp.reduction ` operation.
> > > > > > 
> > > > > > So for a reduction happening on an addition (say `b = b + a`).
> > > > > > We pattern match on the following IR,
> > > > > > ```
> > > > > > %b = fir.load %bref
> > > > > > %a = fir.load %aref
> > > > > > %r = arith.add %a, %b
> > > > > > fir.store %r
> > > > > > ```
> > > > > > and get the following result IR.
> > > > > > ```
> > > > > > %a = fir.load %aref
> > > > > > omp.reduction %a, %bref
> > > > > > ```
> > > > > > 
> > > > > > An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.
> > > > > Thanks @kiranchandramohan for answer this question.
> > > > > 
> > > > > For "redefined", what I mean is that the addition statement `b = b + a` is not lowered as `%b_new_val = arith.add %b_val, %a_val` any more. Instead, it will be lowered as `omp.reduction %a_val, %b_red`.
> > > > > 
> > > > > The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering  to be open to be changed. 
> > > > > 
> > > > > Beyond this, there are two scenarios having the similar problem:
> > > > > 
> > > > > 1. Complex multiplication and complex conjugate multiplication
> > > > > 
> > > > > The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > > > 
> > > > > 2. target/chip dependent transformation
> > > > > 
> > > > > One example is as follows:
> > > > > subroutine sub(a, b, c, m, n, k)
> > > > >   real :: a(m, k), b(k, n), c(m, n)
> > > > >   !DIR ARM-SME (scalar matrix extension)
> > > > >   c = a * b
> > > > > end
> > > > > 
> > > > > With the compiler directive, the statement `c = a * b` can be lowered using some special transformation.
> > > > > 
> > > > > All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.
> > > > > for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > > 
> > > > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. 
> > > > 
> > > > > For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.
> > > > 
> > > > Are you doing this only when b is a scalar or also when b is an array ?
> > > > 
> > > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> > > > 
> > > > > With the compiler directive, the statement c = a * b can be lowered using some special transformation.
> > > > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.
> > > > 
> > > > I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> > > > 
> > > > 
> > > ```Are you doing this only when b is a scalar or also when b is an array ?
> > > 
> > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?```
> > > 
> > > Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate `omp.reduction` operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.
> > > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.
> > 
> > For the following case:
> > ```
> > subroutine sub
> >   complex :: a, b, c
> >   c = conj(a) * b
> > end
> > ```
> > With these high level FIR Ops, I hope to generate the following FIR:
> > ```
> > %a = fir.declare ...
> > %b = fir.declare ...
> > %c = complex.conjmul %a, %b : fir.complex<4>
> > ```
> > Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support. 
> > 
> > > Are you doing this only when b is a scalar or also when b is an array ?
> > 
> > It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.
> > 
> > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> > 
> > It generates FIR, and match the pattern, then replace the generated FIR with `omp.reduction`. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better.
> > 
> > Here is one example (from the OpenMP Spec attached examples):
> > ```
> > SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
> >  REAL :: X(*), A, D
> >  INTEGER :: Y(*), N, B, C
> >  INTEGER :: I
> >  A = 0
> >  B = 0
> >  C = Y(1)
> >  D = X(1)
> >  !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
> >  !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
> >  DO I=1,N
> >  A = A + X(I)
> >  B = IEOR(B, Y(I))
> >  C = MIN(C, Y(I))
> >  IF (D < X(I)) D = X(I)
> >  END DO
> > 
> > END SUBROUTINE REDUCTION1
> > ```
> > I think that the pattern match is hard to capture the `max reduction of D`. From the `evaluate::Expr`, we only need to check if the `GetSymbol(lhs)` is the reduction list item. We can emit the assertion if the `rhs` is not reduction identifier such as `+, IEOR, MIN` here. For the `if` statement, it is hard to give an reasonable assertion.
> > 
> > > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> > 
> > Thanks for the confirmation.
> Agree.
For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array section, it will change (you would have a fir.elemental containing the scalar reduction operation plus a fir.assign a bit as in the second example in the example section below). With arrays, you would also need to be careful with the potential overlaps in the assignments.

I think we should aim at being able to detect reduction in FIR. OpenMP might not be the only case that would want to do special optimization with a reduction. Now, if replacing the generated FIR by a omp.reduction is required for a correctness point of view, that may be problematic, because I am not sure we can guarantee that MLIR pattern matching will work at a 100% if some in-between passes slightly modify the FIR in correct by unexpected ways.

================
Comment at: flang/docs/HighLevelFIR.md:169
+
+Motivation: avoid the need to materialize expressions in temporaries while
+lowering.
----------------
jeanPerier wrote:
> peixin wrote:
> > peixin wrote:
> > > kiranchandramohan wrote:
> > > > jeanPerier wrote:
> > > > > peixin wrote:
> > > > > > kiranchandramohan wrote:
> > > > > > > jeanPerier wrote:
> > > > > > > > peixin wrote:
> > > > > > > > > Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using `fir.expr`? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:
> > > > > > > > > ```
> > > > > > > > > +, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
> > > > > > > > > ```
> > > > > > > > Interesting, what are you currently pattern matching ?
> > > > > > > > I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.
> > > > > > > > 
> > > > > > > > As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators  and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.
> > > > > > > > 
> > > > > > > > Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.
> > > > > > > Thanks @peixin for raising this point.
> > > > > > > 
> > > > > > > Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an `omp.reduction ` operation.
> > > > > > > 
> > > > > > > So for a reduction happening on an addition (say `b = b + a`).
> > > > > > > We pattern match on the following IR,
> > > > > > > ```
> > > > > > > %b = fir.load %bref
> > > > > > > %a = fir.load %aref
> > > > > > > %r = arith.add %a, %b
> > > > > > > fir.store %r
> > > > > > > ```
> > > > > > > and get the following result IR.
> > > > > > > ```
> > > > > > > %a = fir.load %aref
> > > > > > > omp.reduction %a, %bref
> > > > > > > ```
> > > > > > > 
> > > > > > > An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.
> > > > > > Thanks @kiranchandramohan for answer this question.
> > > > > > 
> > > > > > For "redefined", what I mean is that the addition statement `b = b + a` is not lowered as `%b_new_val = arith.add %b_val, %a_val` any more. Instead, it will be lowered as `omp.reduction %a_val, %b_red`.
> > > > > > 
> > > > > > The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering  to be open to be changed. 
> > > > > > 
> > > > > > Beyond this, there are two scenarios having the similar problem:
> > > > > > 
> > > > > > 1. Complex multiplication and complex conjugate multiplication
> > > > > > 
> > > > > > The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > > > > 
> > > > > > 2. target/chip dependent transformation
> > > > > > 
> > > > > > One example is as follows:
> > > > > > subroutine sub(a, b, c, m, n, k)
> > > > > >   real :: a(m, k), b(k, n), c(m, n)
> > > > > >   !DIR ARM-SME (scalar matrix extension)
> > > > > >   c = a * b
> > > > > > end
> > > > > > 
> > > > > > With the compiler directive, the statement `c = a * b` can be lowered using some special transformation.
> > > > > > 
> > > > > > All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.
> > > > > > for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.
> > > > > 
> > > > > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. 
> > > > > 
> > > > > > For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.
> > > > > 
> > > > > Are you doing this only when b is a scalar or also when b is an array ?
> > > > > 
> > > > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> > > > > 
> > > > > > With the compiler directive, the statement c = a * b can be lowered using some special transformation.
> > > > > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.
> > > > > 
> > > > > I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> > > > > 
> > > > > 
> > > > ```Are you doing this only when b is a scalar or also when b is an array ?
> > > > 
> > > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?```
> > > > 
> > > > Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate `omp.reduction` operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.
> > > > I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.
> > > 
> > > For the following case:
> > > ```
> > > subroutine sub
> > >   complex :: a, b, c
> > >   c = conj(a) * b
> > > end
> > > ```
> > > With these high level FIR Ops, I hope to generate the following FIR:
> > > ```
> > > %a = fir.declare ...
> > > %b = fir.declare ...
> > > %c = complex.conjmul %a, %b : fir.complex<4>
> > > ```
> > > Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support. 
> > > 
> > > > Are you doing this only when b is a scalar or also when b is an array ?
> > > 
> > > It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.
> > > 
> > > > Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?
> > > 
> > > It generates FIR, and match the pattern, then replace the generated FIR with `omp.reduction`. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better.
> > > 
> > > Here is one example (from the OpenMP Spec attached examples):
> > > ```
> > > SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
> > >  REAL :: X(*), A, D
> > >  INTEGER :: Y(*), N, B, C
> > >  INTEGER :: I
> > >  A = 0
> > >  B = 0
> > >  C = Y(1)
> > >  D = X(1)
> > >  !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
> > >  !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
> > >  DO I=1,N
> > >  A = A + X(I)
> > >  B = IEOR(B, Y(I))
> > >  C = MIN(C, Y(I))
> > >  IF (D < X(I)) D = X(I)
> > >  END DO
> > > 
> > > END SUBROUTINE REDUCTION1
> > > ```
> > > I think that the pattern match is hard to capture the `max reduction of D`. From the `evaluate::Expr`, we only need to check if the `GetSymbol(lhs)` is the reduction list item. We can emit the assertion if the `rhs` is not reduction identifier such as `+, IEOR, MIN` here. For the `if` statement, it is hard to give an reasonable assertion.
> > > 
> > > > That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.
> > > 
> > > Thanks for the confirmation.
> > Agree.
> For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array section, it will change (you would have a fir.elemental containing the scalar reduction operation plus a fir.assign a bit as in the second example in the example section below). With arrays, you would also need to be careful with the potential overlaps in the assignments.
> 
> I think we should aim at being able to detect reduction in FIR. OpenMP might not be the only case that would want to do special optimization with a reduction. Now, if replacing the generated FIR by a omp.reduction is required for a correctness point of view, that may be problematic, because I am not sure we can guarantee that MLIR pattern matching will work at a 100% if some in-between passes slightly modify the FIR in correct by unexpected ways.
> With these high level FIR Ops, I hope to generate the following FIR:

I do not think the high level ops will allow to produce exactly what you wrote unless we were to add FIR specific complex operation working on "variables", and that was not my goal since MLIR complex dialect could already. However, I think that what it could be lowered to would still suits your optimization goal.

This could be  be lowered to:
```
%a = fir.declare ... : fir.ref<mlir::complex<f32>>
%b = fir.declare ... : fir.ref<mlir::complex<f32>>
%c = fir.declare ... : fir.ref<mlir::complex<f32>>
%aval = fir.load %a
%aconj = complex.conj %aval
%bval = fir.load %b
%res = complex.mul %aconj, %b 
fir.assign %res to %c
```

The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single  conjmul op (does it exists yet in the MLIR complex dialect, I could not find it ?).

> Here is one example (from the OpenMP Spec attached examples):

Thanks for the example. One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

If so, that means that this kind of pattern matching is already available on the parse tree / evaluate expr.
I am not a fan of low level intrinsic operations (like +) being lowered differently based on the context, so I would avoid it if possible. But if this turns out to be really required, we could try to find clean ways to overrides certain expression expr node without redefining a completely different parse-tree/expression visitor.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134285/new/

https://reviews.llvm.org/D134285