[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

Mon Feb 17 10:36:01 PST 2020

> The point here is that we do not want to use MLIR just as a pass-through
layer because MLIR has a lot of strengths

> The point here is that if we lower to LLVM dialect, we will not be able
to reuse OpenMP codegen & optimisation code from Clang/LLVM.

Just to be clear, are you suggesting that if OpenMPIRBuilder is used, MLIR
will have to be used as pass-through without the optimizations?

On Sun, Feb 16, 2020 at 6:49 AM Kiran Chandramohan <
Kiran.Chandramohan at arm.com> wrote:

> Thanks, Vinay for further details and discussion.
>
> > “If we decide that the OpenMP construct (for e.g. collapse) can be
> handled fully in MLIR and  that is the best place to do it (based on
> experiments) then we will not use the OpenMP IRBuilder for these constructs.”
> -- latest RFC in discourse
>
> If it is not finalized that the OpenMPIRBuilder will be used for all the
> constructs, wouldn’t it be better to delay the submission of “translation
> to LLVM IR” patch in MLIR? Lowering code will become inconsistent if the
> OpenMPIRBuilder is used only for a few constructs and not for others.
>
>
> I was hoping that we can identify a set of constructs/clauses which can be
> fully handled inside the MLIR layer itself. As an example, I provided the
> collapse clause. This will include constructs/clauses which do not generate
> runtime API calls, which pass metadata to LLVM to do some optimisation etc.
> Yes, the list is not finalized. The point here is that we do not want to
> use MLIR just as a pass-through layer because MLIR has a lot of strengths.
> Would it be OK to continue with only constructs that use OpenMPIRBuilder
> before the final list is made?
>
> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect to
> LLVM IR. This is different from most dialects which get directly lowered to
> LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if
> OpenMPIRBuilder is not being considered for all constructs.
> Mehdi also seems to have the same suggestion: “I agree that having
> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962
>
>
> The point here is that if we lower to LLVM dialect, we will not be able to
> reuse OpenMP codegen & optimisation code from Clang/LLVM.
> It was pointed out to me early on by the MLIR developers that there are a
> few dialects (like NVVM) which are lowered along with LLVM dialect.
>
>
> You ask a lot of specific questions about which types, dialects, memory
> access operations will be supported and also the lowering for parallel do. Yes,
> the RFC does not provide all this information. This will become clear only
> as we make progress with the OpenMP dialect. I would like and I am
> interested to provide answers to all your questions in the following weeks.
> Please allow some time.
>
>
Thank you all for your responses so far. Awaiting your further responses.

> --Kiran
>
>
> ------------------------------
> *From:* Vinay Madhusudan <vinay at compilertree.com>
> *Sent:* 15 February 2020 16:22
> *To:* Kiran Chandramohan <Kiran.Chandramohan at arm.com>
> *Cc:* Vinay Madhusudan via flang-dev <flang-dev at lists.llvm.org>;
> llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] [flang-dev] About OpenMP dialect in MLIR
>
>
> Reply to Kiran Chandramohan:
>
> > You are welcome to participate, provide feedback and criticism to change
> the design as well as to contribute to the implementation.
>
> Thank you Kiran.
>
> > But the latest is what is there in the RFC in discourse.
>
> I have used this as reference for the response.
>
> > We did a study of a few constructs and clauses which was shared as mails
> to flang-dev and the RFC. As we make progress and before implementation, we
> will share further details.
>
> > “ Yes, parallel and flush would be the next two constructs that we will
> do.” -- from a comment in latest RFC
>
> For the above mentioned reasons, I will try to restrict my reply to how
> the “parallel (do)” construct would be lowered.
>
> > If it is OK we can have further discussions in discourse as River points
> out.
>
> Given that the multiple components of the LLVM project, namely clang,
> flang, MLIR and LLVM are involved, llvm-dev is probably a better place,
> with a much wider audience, until it is clear how different components must
> interact.
>
> > It is the review for translation to LLVM IR that is in progress.
>
> > “If we decide that the OpenMP construct (for e.g. collapse) can be
> handled fully in MLIR and  that is the best place to do it (based on
> experiments) then we will not use the OpenMP IRBuilder for these constructs.”
> -- latest RFC in discourse
>
> If it is not finalized that the OpenMPIRBuilder will be used for all the
> constructs, wouldn’t it be better to delay the submission of “translation
> to LLVM IR” patch in MLIR? Lowering code will become inconsistent if the
> OpenMPIRBuilder is used only for a few constructs and not for others.
>
> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect to
> LLVM IR. This is different from most dialects which get directly lowered to
> LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if
> OpenMPIRBuilder is not being considered for all constructs.
>
> Mehdi also seems to have the same suggestion: “I agree that having
> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962
>
> > Yes, the design has mildly changed over time to incorporate feedback.
> But the latest is what is there in the RFC in discourse.
>
> RFC fails to discuss the following (I have also mentioned some of them in
> my reply to Johannes):
>
> > The proposed plan involves a) lowering F18 AST with OpenMP directly to a
> mix of OpenMP and FIR dialects. b) converting this finally to a mix of
> OpenMP and LLVM dialects.
>
> It is unclear in the RFC what other dialects are considered as supported
> for OpenMP dialect  (std, affine, vector, loop, etc) and how it would be
> transformed, used and lowered from FIR to LLVM.
>
> It becomes important to list down the various dialects / operations /
> types supported for OpenMP (which is mainly defined for C, C++ and Fortran
> programs. MLIR has a much wider scope.
>
> It wouldn’t add much value for the proposed OpenMP dialect to be in the
> MLIR tree if it cannot support at least the relevant standard dialect types
> / operations.
>
> > We would like to take advantage of the transformations in cases that are
> possible. FIR loops will be converted to affine/loop dialect. So the loop
> inside an omp.do can be in these dialects as clarified in the discussion in
> discourse and also shown in slide 20 of the fosdem presentation (links to
> both below).
>
>
> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
>
>
> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf
>
> Although it is mentioned that the affine/ loop.for is used, following
> things are unclear:
>
> I am assuming that there will be lowering / conversion code in f18 repo
> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I
> think it is worth mentioning it in the “sequential code flow
> representation” in the RFC.
>
> This raises the following questions.
>
>
>
>    1.
>
>    Which types are supported? Standard dialect types and FIR types?
>
>
> For example, what types are used for Fortran arrays used inside OpenMP
> regions? Is it std.memref OR Fortran array representation in FIR dialect
> (fir.array?) OR both?  Note that Fortran has support for column major
> arrays. std.memref supports custom memory layouts. What custom layouts are
> supported?
>
>
> How would different non-scalar types in standard dialect  be lowered to
> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on
> this?
>
> The example provided in slide 20 of the fosdem presentation contains
>
> “loop.for %j = %lb2 to %ub2 : !integer {“
>
> But loop.for accepts “index” type. Not sure what type “!integer”
> represents here.
>
>
>    1.
>
>    What are the different memory access operations which are supported
>    inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM
>    IR?
>
>
> The possibilities are:
>
>    1.
>
>    affine.load / affine.store
>    2.
>
>    std.load / std.store
>    3.
>
>    FIR dialect memory access operations.
>
>
> > I must also point out that the question of where to do loop
> transformations is a topic we have not fully converged on. See the
> following thread for discussions.
> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html
>
> Looks like placement (MLIR / LLVM) of various transformations related to
> OpenMP has not been finalized, from what I could infer from Johannes’s
> reply and the below text in the latest RFC in discourse:
>
> “So there exist some questions regarding where the optimisations should
> be carried out. We will decide on which framework to choose only after some
> experimentation.”
>
> > i) we need to keep the loops separately so as to take advantage of
> transformations that other dialects like affine/loop would provide.
>
> 1) Keeping the loops separate from the OpenMP operations will expose them
> to the “regular” transformations passes in MLIR inside the OpenMP region.
> Most of them are invalid or in-efficient for OpenMP operations.
>
> Examples:
>
>    1.
>
>    Constant propagation example mentioned by Johannes in this thread.
>    (omp task shared(x))
>    2.
>
>    Loop (nest) transformations (permute / split / fuse / tile, etc) will
>    happen ignoring the surrounding OpenMP operations.
>    3.
>
>    Hoisting and sinking of various memory/ SSA values inside the OpenMP
>    region. This goes against the likes of “map”, “firstprivate”, shared, etc
>    clauses and more.
>
>
> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will
> have) different transformations/ optimization passes which are different
> from one another.
>
> Example:
>
>    1.
>
>    AffineLoopInvariantCodeMotion.cpp is different from
>    LoopInvariantCodeMotion.cpp.
>    2.
>
>    Other Loop transformation passes for affine.for
>
>
> These loops also use different Types and memory access operations in
> general for transformations. Example, most Affine dialect transformations
> (if not all) work on affine.load and affine.store operations.
>
> Supporting different loop operations means that there would be *OpenMP
> specific transformations* for each one of them and also requires a way to
> restrict each of them from existing transformations (when nested in OpenMP
> constructs).
>
> There would be different lowerings for different loop operations as well.
> Example, affine.for and loop.for would have to be lowered to omp.do in
> different ways.
>
> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP
> dialect representation is as follows:
>
> ------------------------------
>
> Mlir.region(…) {
>
>    omp.parallel  {
>
>      %ub3 = …
>
>      omp.do %i = 0 to %ub3 : !integer  {
>
>      …
>
>      }
>
>   }
>
> }
>
> -------------------------------
>
> Currently, the LLVM Dialect doesn’t contain a high level loop operation.
> It is all based on CFG implementation.
>
> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR
> there would be CFG for LLVM Dialect based loop operation?
>
> Can you please mention how the OpenMP + LLVM dialect will look like for
> the below parallel do construct?
>
> integer :: i=1, k=10
>
> integer :: a(10), b(10), c(10)
>
> ...
>
>  !$omp parallel do
>
>   do i = 1, k
>
>     if (i .ne. 1) *cycle*
>
>     c(i) = a(i) + b(i)
>
>   end do
>
>   !$omp end parallel do
>
> print *,c
>
>
> Thanks,
>
> Vinay
>
> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hello Vinay,
>
> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know that
> you and several other groups are interested in the OpenMP dialect. At the
> outset, I must point out that the design is not set in stone and will
> change as we make progress. You are welcome to participate, provide
> feedback and criticism to change the design as well as to contribute to the
> implementation. I provide some clarifications and replies to your comments
> below. If it is OK we can have further discussions in discourse as River
> points out.
>
> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> frontends. Note that this proposal was before considering MLIR for FIR.
>
> A correction here. The proposal for OpenMPIRBuilder was made when MLIR was
> being considered for FIR.
> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions
> considering MLIR for FIR.
>
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html
> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR.
>
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html
>
> b. Review of barrier construct is in progress:
> https://reviews.llvm.org/D72962
>
> Minor correction here. The addition of barrier construct was accepted and
> has landed (https://reviews.llvm.org/D7240
> <https://reviews.llvm.org/D72400>). It is the review for translation to
> LLVM IR that is in progress.
>
> It looks like the design has evolved over time and there is no one place
> which contains the latest design decisions that fits all the different
> pieces of the puzzle. I will try to deduce it from the above mentioned
> references. Please correct me If I am referring to anything which has
> changed.
>
> Yes, the design has mildly changed over time to incorporate feedback. But
> the latest is what is there in the RFC in discourse.
>
> For most OpenMP design discussions, FIR examples are used (as seen in (2)
> and (3)). The MLIR examples mentioned in the design only talks about FIR
> dialect and LLVM dialect.
>
> Our initial concern was how will all these pieces (FIR, LLVM Dialect,
> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR
> and LLVM dialect and more information about lowering/translation than
> transformations/optimisations.
>
> This completely ignores the likes of standard, affine (where most loop
> transformations are supposed to happen) and loop dialects.
>
> Adding to the reply above. We would like to take advantage of the
> transformations in cases that are possible. FIR loops will be converted to
> affine/loop dialect. So the loop inside an omp.do can be in these dialects
> as clarified in the discussion in discourse and also shown in slide 20 of
> the fosdem presentation (links to both below).
>
> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
>
> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf
>
> I must also point out that the question of where to do loop
> transformations is a topic we have not fully converged on. See the
> following thread for discussions.
> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html
>
> Is it the same omp.do operation which now contains the bounds and
> induction variables of the loop after the LLVM conversion?
>
> The point here is that i) we need to keep the loops separately so as to
> take advantage of transformations that other dialects like affine/loop
> would provide. ii) We will need the loop information while lowering the
> OpenMP do operation. For implementation, if reusing the same operation (in
> different contexts) is difficult then we can add a new operation.
>
> It is also not mentioned how clauses like firstprivate, shared, private,
> reduce, map, etc are lowered to OpenMP dialect.
>
> Yes, it is not mentioned. We did a study of a few constructs and clauses
> which was shared as mails to flang-dev and the RFC. As we make progress and
> before implementation, we will share further details.
>
> it would be beneficial to have an omp.parallel_do operation which has
> semantics similar to other loop structures (may not be LoopLikeInterface)
> in MLIR.
>
> I am not against adding parallel_do if it can help with transformations or
> reduce the complexity of lowering. Please share the details in discourse as
> a reply to the RFC or a separate thread.
>
> it looks like having OpenMP operations based on standard MLIR types and
> operations (scalars and memrefs mainly) is the right way to go.
>
> This will definitely be the first version that we implement. But I do not
> understand why we should restrict to only the standard types and
> operations. To ease lowering and translation and to avoid adding OpenMP
> operations to other dialects, I believe OpenMP dialect should also be able
> to exist with other dialects like FIR and LLVM.
>
> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides.
>
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.
>
> I might not have fully understood you here. But the dialect lives
> independently of the translation to LLVM IR. If there are optimisations
> (like hoisting that you mention here) I believe they can be performed as
> transformation passes on the dialect. It is not ruled out.
>
> --Kiran
> ------------------------------
> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of Vinay
> Madhusudan via flang-dev <flang-dev at lists.llvm.org>
> *Sent:* 13 February 2020 16:33
> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>;
> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org>
> *Subject:* [flang-dev] About OpenMP dialect in MLIR
>
>
> Hi,
>
> I have few questions / concerns regarding the design of OpenMP dialect in
> MLIR that is currently being implemented, mainly for the f18 compiler.
> Below, I summarize the current state of various efforts in clang / f18 /
> MLIR / LLVM regarding this. Feel free to add to the list in case I have
> missed something.
>
> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> frontends. Note that this proposal was before considering MLIR for FIR.
>
> a. llvm-dev proposal :
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
>
> b. Patches in review: https://reviews.llvm.org/D70290. This also includes
> the clang codegen changes.
>
> 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> proposed with respect to the f18 compilation stack (keeping FIR in mind).
>
> a. flang-dev discussion link:
> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
>
> b. Design decisions captured in PPT:
> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
>
> c. MLIR google groups discussion:
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
>
> d. Target constructs  design:
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
>
> e. SIMD constructs design:
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
>
> 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
>
> 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
>
> a. The first patch which introduces the OpenMP dialect was pushed.
>
> b. Review of barrier construct is in progress:
> https://reviews.llvm.org/D72962
> https://reviews.llvm.org/D72400
>
> I have tried to list below different topics of interest (to different
> people) around this work. Most of these are in the design phase (or very
> new) and multiple parties are interested with different sets of goals in
> mind.
>
> I.  Flang frontend and its integration
>
> II. Fortran representation in MLIR / FIR development
>
> III. OpenMP development for flang,  OpenMP builder in LLVM.
>
> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
>
> It looks like the design has evolved over time and there is no one place
> which contains the latest design decisions that fits all the different
> pieces of the puzzle. I will try to deduce it from the above mentioned
> references. Please correct me If I am referring to anything which has
> changed.
>
> A. For most OpenMP design discussions, FIR examples are used (as seen in
> (2) and (3)). The MLIR examples mentioned in the design only talks about
> FIR dialect and LLVM dialect.
>
> This completely ignores the likes of standard, affine (where most loop
> transformations are supposed to happen) and loop dialects. I think it is
> critical to decouple the OpenMP dialect development in MLIR from the
> current flang / FIR effort. It would be useful if someone can mention these
> examples using existing dialects in MLIR and also how the different
> transformations / lowerings are planned.
>
> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> version will be as follows,
>
>   omp.parallel {
>
>     omp.do {
>
>        fir.do %i = 0 to %ub3 : !fir.integer {
>
>         ...
>
>        }
>
>     }
>
>   }
>
> and then after the "LLVM conversion" it is converted as follows:
>
>   omp.parallel {
>
>     %ub3 =
>
>     omp.do %i = 0 to %ub3 : !llvm.integer {
>
>     ...
>
>     }
>
>   }
>
>
> a. Is it the same omp.do operation which now contains the bounds and
> induction variables of the loop after the LLVM conversion? If so, will the
> same operation have two different semantics during a single compilation?
>
> b. Will there be different lowerings for various loop operations from
> different dialects? loop.for and affine.for under omp operations would need
> different OpenMP / LLVM lowerings. Currently, both of them are lowered to
> the CFG based loops during the LLVM dialect conversion (which is much
> before the proposed OpenMP dialect lowering).
>
> There would be no standard way to represent OpenMP operations (especially
> the ones which involve loops) in MLIR. This would drastically complicate
> lowering.
>
> C. It is also not mentioned how clauses like firstprivate, shared,
> private, reduce, map, etc are lowered to OpenMP dialect. The example in
> the RFC contains FIR and LLVM types and nothing about std dialect types.
> Consider the below example:
>
> #pragma omp parallel for reduction(+:x)
>
> for (int i = 0; i < N; ++i)
>
>   x += a[i];
>
> How would the above be represented in OpenMP dialect? and What type would
> "x" be in MLIR?  It is not mentioned in the design as to how the various
> SSA values for various OpenMP clauses are passed around in OpenMP
> operations.
>
> D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> parallel_do operation which has semantics similar to other loop
> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like
> having OpenMP operations based on standard MLIR types and operations
> (scalars and memrefs mainly) is the right way to go.
>
> Why not have omp.parallel_do operation with AffineMap based bounds, so as
> to decouple it from Value/Type similar to affine.for?
>
> 1. With the current design, the number of transformations / optimizations
> that one can write on OpenMP constructs would become limited as there can
> be any custom loop structure with custom operations / types inside it.
>
> 2. It would also be easier to transform the Loop nests containing OpenMP
> constructs if the body of the OpenMP operations is well defined (i.e., does
> not accept arbitrary loop structures). Having nested redundant "parallel" ,
> "target" and "do" regions seems unnecessary.
>
> 3. There would also be new sets of loop structures in new dialects when
> C/C++ is compiled to MLIR. It would complicate the number of possible
> combinations inside the OpenMP region.
>
> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> able to compile the code for heterogeneous hardware is one of the biggest
> advantages that MLIR brings to the table. That is being completely missed
> here. This also requires solving the problem of handling target information
> in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> dialect also gives us an opportunity to represent offloading semantics in
> MLIR.
>
> Given the ability to represent multiple ModuleOps and the existence of GPU
> dialect, couldn't higher level optimizations on offloaded code be done at
> MLIR level?. The proposed design would lead us to the same problems that we
> are currently facing in LLVM IR.
>
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.
>
> Thanks,
>
> Vinay
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/331738da/attachment-0001.html>