[flang-dev] About OpenMP dialect in MLIR

Thu Feb 13 17:22:08 PST 2020

Hello Vinay,

Thanks for your mail about the OpenMP dialect in MLIR. Happy to know that you and several other groups are interested in the OpenMP dialect. At the outset, I must point out that the design is not set in stone and will change as we make progress. You are welcome to participate, provide feedback and criticism to change the design as well as to contribute to the implementation. I provide some clarifications and replies to your comments below. If it is OK we can have further discussions in discourse as River points out.
1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang frontends. Note that this proposal was before considering MLIR for FIR.
A correction here. The proposal for OpenMPIRBuilder was made when MLIR was being considered for FIR.
(i) Gary Klimowicz's minutes for Flang call in April 2019 mentions considering MLIR for FIR.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html
(ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html

b. Review of barrier construct is in progress: https://reviews.llvm.org/D72962

Minor correction here. The addition of barrier construct was accepted and has landed (https://reviews.llvm.org/D7240<https://reviews.llvm.org/D72400>). It is the review for translation to LLVM IR that is in progress.

It looks like the design has evolved over time and there is no one place which contains the latest design decisions that fits all the different pieces of the puzzle. I will try to deduce it from the above mentioned references. Please correct me If I am referring to anything which has changed.
Yes, the design has mildly changed over time to incorporate feedback. But the latest is what is there in the RFC in discourse.

For most OpenMP design discussions, FIR examples are used (as seen in (2) and (3)). The MLIR examples mentioned in the design only talks about FIR dialect and LLVM dialect.
Our initial concern was how will all these pieces (FIR, LLVM Dialect, OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR and LLVM dialect and more information about lowering/translation than transformations/optimisations.

This completely ignores the likes of standard, affine (where most loop transformations are supposed to happen) and loop dialects.
Adding to the reply above. We would like to take advantage of the transformations in cases that are possible. FIR loops will be converted to affine/loop dialect. So the loop inside an omp.do can be in these dialects as clarified in the discussion in discourse and also shown in slide 20 of the fosdem presentation (links to both below).
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf

I must also point out that the question of where to do loop transformations is a topic we have not fully converged on. See the following thread for discussions.
http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html

Is it the same omp.do operation which now contains the bounds and induction variables of the loop after the LLVM conversion?
The point here is that i) we need to keep the loops separately so as to take advantage of transformations that other dialects like affine/loop would provide. ii) We will need the loop information while lowering the OpenMP do operation. For implementation, if reusing the same operation (in different contexts) is difficult then we can add a new operation.
It is also not mentioned how clauses like firstprivate, shared, private, reduce, map, etc are lowered to OpenMP dialect.
Yes, it is not mentioned. We did a study of a few constructs and clauses which was shared as mails to flang-dev and the RFC. As we make progress and before implementation, we will share further details.

it would be beneficial to have an omp.parallel_do operation which has semantics similar to other loop structures (may not be LoopLikeInterface) in MLIR.
I am not against adding parallel_do if it can help with transformations or reduce the complexity of lowering. Please share the details in discourse as a reply to the RFC or a separate thread.
it looks like having OpenMP operations based on standard MLIR types and operations (scalars and memrefs mainly) is the right way to go.
This will definitely be the first version that we implement. But I do not understand why we should restrict to only the standard types and operations. To ease lowering and translation and to avoid adding OpenMP operations to other dialects, I believe OpenMP dialect should also be able to exist with other dialects like FIR and LLVM.
E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct lowering to LLVM IR ignoring all the advantages that MLIR provides.
Also, OpenMP codegen will automatically benefit from the GPU dialect based optimizations. For example, it would be way easier to hoist a memory reference out of GPU kernel in MLIR than in LLVM IR.
I might not have fully understood you here. But the dialect lives independently of the translation to LLVM IR. If there are optimisations (like hoisting that you mention here) I believe they can be performed as transformation passes on the dialect. It is not ruled out.

--Kiran
________________________________
From: flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of Vinay Madhusudan via flang-dev <flang-dev at lists.llvm.org>
Sent: 13 February 2020 16:33
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; flang-dev at lists.llvm.org <flang-dev at lists.llvm.org>
Subject: [flang-dev] About OpenMP dialect in MLIR

Hi,

I have few questions / concerns regarding the design of OpenMP dialect in MLIR that is currently being implemented, mainly for the f18 compiler. Below, I summarize the current state of various efforts in clang / f18 / MLIR / LLVM regarding this. Feel free to add to the list in case I have missed something.

1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang frontends. Note that this proposal was before considering MLIR for FIR.

a. llvm-dev proposal : http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

b. Patches in review: https://reviews.llvm.org/D70290. This also includes the clang codegen changes.

2.  [July - September 2019] OpenMP dialect for MLIR was discussed / proposed with respect to the f18 compilation stack (keeping FIR in mind).

a. flang-dev discussion link: https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html

b. Design decisions captured in PPT: https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view

c. MLIR google groups discussion: https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

d. Target constructs  design: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

e. SIMD constructs design: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html

3.  [Jan 2020] OpenMP dialect RFC in llvm discourse : https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397

4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:

a. The first patch which introduces the OpenMP dialect was pushed.

b. Review of barrier construct is in progress: https://reviews.llvm.org/D72962

https://reviews.llvm.org/D72400

I have tried to list below different topics of interest (to different people) around this work. Most of these are in the design phase (or very new) and multiple parties are interested with different sets of goals in mind.

I.  Flang frontend and its integration

II. Fortran representation in MLIR / FIR development

III. OpenMP development for flang,  OpenMP builder in LLVM.

IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.

It looks like the design has evolved over time and there is no one place which contains the latest design decisions that fits all the different pieces of the puzzle. I will try to deduce it from the above mentioned references. Please correct me If I am referring to anything which has changed.

A. For most OpenMP design discussions, FIR examples are used (as seen in (2) and (3)). The MLIR examples mentioned in the design only talks about FIR dialect and LLVM dialect.

This completely ignores the likes of standard, affine (where most loop transformations are supposed to happen) and loop dialects. I think it is critical to decouple the OpenMP dialect development in MLIR from the current flang / FIR effort. It would be useful if someone can mention these examples using existing dialects in MLIR and also how the different transformations / lowerings are planned.

B. In latest RFC(3), it is mentioned that the initial OpenMP dialect version will be as follows,

  omp.parallel {

    omp.do {

       fir.do %i = 0 to %ub3 : !fir.integer {

        ...

       }

    }

  }

and then after the "LLVM conversion" it is converted as follows:

  omp.parallel {

    %ub3 =

    omp.do %i = 0 to %ub3 : !llvm.integer {

    ...

    }

  }

a. Is it the same omp.do operation which now contains the bounds and induction variables of the loop after the LLVM conversion? If so, will the same operation have two different semantics during a single compilation?

b. Will there be different lowerings for various loop operations from different dialects? loop.for and affine.for under omp operations would need different OpenMP / LLVM lowerings. Currently, both of them are lowered to the CFG based loops during the LLVM dialect conversion (which is much before the proposed OpenMP dialect lowering).

There would be no standard way to represent OpenMP operations (especially the ones which involve loops) in MLIR. This would drastically complicate lowering.

C. It is also not mentioned how clauses like firstprivate, shared, private, reduce, map, etc are lowered to OpenMP dialect. The example in the RFC contains FIR and LLVM types and nothing about std dialect types. Consider the below example:

#pragma omp parallel for reduction(+:x)

for (int i = 0; i < N; ++i)

  x += a[i];

How would the above be represented in OpenMP dialect? and What type would "x" be in MLIR?  It is not mentioned in the design as to how the various SSA values for various OpenMP clauses are passed around in OpenMP operations.

D. Because of (A), (B) and (C), it would be beneficial to have an omp.parallel_do operation which has semantics similar to other loop structures (may not be LoopLikeInterface) in MLIR. To me, it looks like having OpenMP operations based on standard MLIR types and operations (scalars and memrefs mainly) is the right way to go.

Why not have omp.parallel_do operation with AffineMap based bounds, so as to decouple it from Value/Type similar to affine.for?

1. With the current design, the number of transformations / optimizations that one can write on OpenMP constructs would become limited as there can be any custom loop structure with custom operations / types inside it.

2. It would also be easier to transform the Loop nests containing OpenMP constructs if the body of the OpenMP operations is well defined (i.e., does not accept arbitrary loop structures). Having nested redundant "parallel" , "target" and "do" regions seems unnecessary.

3. There would also be new sets of loop structures in new dialects when C/C++ is compiled to MLIR. It would complicate the number of possible combinations inside the OpenMP region.

E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct lowering to LLVM IR ignoring all the advantages that MLIR provides. Being able to compile the code for heterogeneous hardware is one of the biggest advantages that MLIR brings to the table. That is being completely missed here. This also requires solving the problem of handling target information in MLIR. But that is a problem which needs to be solved anyway. Using GPU dialect also gives us an opportunity to represent offloading semantics in MLIR.

Given the ability to represent multiple ModuleOps and the existence of GPU dialect, couldn't higher level optimizations on offloaded code be done at MLIR level?. The proposed design would lead us to the same problems that we are currently facing in LLVM IR.

Also, OpenMP codegen will automatically benefit from the GPU dialect based optimizations. For example, it would be way easier to hoist a memory reference out of GPU kernel in MLIR than in LLVM IR.

Thanks,

Vinay

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/flang-dev/attachments/20200214/49c8427b/attachment-0001.html>