[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

Tue Feb 18 08:20:44 PST 2020

Please find the reply inline below:

On Tue, Feb 18, 2020 at 8:02 AM Mehdi AMINI <joker.eph at gmail.com> wrote:

>
>
> On Mon, Feb 17, 2020 at 10:29 AM Vinay Madhusudan <vinay at compilertree.com>
> wrote:
>
>> Please find the reply inline below
>>
>> On Sun, Feb 16, 2020 at 12:59 AM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Feb 15, 2020 at 10:42 AM Vinay Madhusudan via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Reply to Kiran Chandramohan:
>>>>
>>>> > You are welcome to participate, provide feedback and criticism to
>>>> change the design as well as to contribute to the implementation.
>>>>
>>>> Thank you Kiran.
>>>>
>>>> > But the latest is what is there in the RFC in discourse.
>>>>
>>>> I have used this as reference for the response.
>>>>
>>>> > We did a study of a few constructs and clauses which was shared as
>>>> mails to flang-dev and the RFC. As we make progress and before
>>>> implementation, we will share further details.
>>>>
>>>> > “ Yes, parallel and flush would be the next two constructs that we
>>>> will do.” -- from a comment in latest RFC
>>>>
>>>> For the above mentioned reasons, I will try to restrict my reply to how
>>>> the “parallel (do)” construct would be lowered.
>>>>
>>>> > If it is OK we can have further discussions in discourse as River
>>>> points out.
>>>>
>>>> Given that the multiple components of the LLVM project, namely clang,
>>>> flang, MLIR and LLVM are involved, llvm-dev is probably a better place,
>>>> with a much wider audience
>>>>
>>>
>>> Possibly wider, but maybe less focused about discussing MLIR dialect
>>> design. In particular there is an RFC thread for this particular dialect on
>>> Discourse, which is the canonical place to discuss its design.
>>>
>>>
>>>> , until it is clear how different components must interact.
>>>>
>>>
>>> They don't need to interact so closely: they are very loosely related:
>>> flang will use MLIR but clang won't (in the foreseeable future) and LLVM
>>> has many other frontends.
>>>
>>>
>>>>
>>>> > It is the review for translation to LLVM IR that is in progress.
>>>>
>>>> > “If we decide that the OpenMP construct (for e.g. collapse) can be
>>>> handled fully in MLIR and  that is the best place to do it (based on
>>>> experiments) then we will not use the OpenMP IRBuilder for these constructs.”
>>>> -- latest RFC in discourse
>>>>
>>>> If it is not finalized that the OpenMPIRBuilder will be used for all
>>>> the constructs, wouldn’t it be better to delay the submission of
>>>> “translation to LLVM IR” patch in MLIR? Lowering code will become
>>>> inconsistent if the OpenMPIRBuilder is used only for a few constructs and
>>>> not for others.
>>>>
>>>
>>>> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect
>>>> to LLVM IR. This is different from most dialects which get directly lowered
>>>> to LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if
>>>> OpenMPIRBuilder is not being considered for all constructs.
>>>>
>>>
>>>
>>> I don't disagree, but there are a lot of speculation here: your quote
>>> starts with "If we decide that the OpenMP construct (for e.g. collapse) can
>>> be handled fully in MLIR", are you thinking that we need to first decide
>>> this once and for all before making progress on building this path?
>>> What disadvantages do you perceive to an approach where we would bring
>>> up this dialect using the OpenMPIRBuilders for exporting to LLVM IR until
>>> we gain enough experience? Do you think starting like this will make it
>>> significantly harder to transition away from the builders if this is what
>>> we want?
>>> It seemed to me like it wouldn't, and that's why I'm supportive of this
>>> path: the omp dialect design, implementation, and the
>>> transformation/analysis that will be performed there seems entirely
>>> disjoint from the LLVM lowering, I'd hope we can swap the LLVM lowering at
>>> a later time (if this is what we'd want).
>>>
>>>
>>>>
>>>>
>> The statement you quoted is from the RFC in discourse by Kiran. It is
>> actually unclear to whom you are referring to here. I am assuming that it
>> is for him to answer.
>>
>
> No I'm asking you. You quoted Kiran and you concluded from this quote
>  "wouldn’t it be better to delay the submission [...]". I am questioning
> this aspect in particular when I wrote "are you thinking that we need to
> first decide this once and for all before making progress on building this
> path?"
> This question and the following are important to answer, it isn't clear to
> that you did in you answer below.  In particular "Do you think starting
> like this will make it significantly harder to transition away from the
> builders if this is what we want?" is important: even if using the
> OpenMPIRBuilder would be suboptimal on the long-term, how much of it would
> be a problem to replace later? It seems to me that it shouldn't limit
> anything, unless you plan to write optimization on the LLVM Dialect itself.
>
>
Given that there are unconcluded things in this thread about OpenMP
representation for basic constructs (including target constructs) in MLIR
and OpenMPIRBuilder being the high level *common* interface for Clang AST
and optimized MLIR IR, I believe that it would be a good idea to wait for
things until there is more clarity on OpenMP in MLIR.

> Best,
>
> --
> Mehdi
>
>
>>
>> The below details would cover some of your questions as well.
>>
>> About Clang / MLIR / LLVM being loosely related and not being relevant in
>> llvm-dev:
>>
>> With the introduction of the OpenMPIRBuilder in MLIR (from this review :
>> https://reviews.llvm.org/D72962), Clang and MLIR would now have the
>> common code for building OpenMP constructs. I do not think it is so loosely
>> related anymore. Note that MLIR and Clang frontends for LLVM are very
>> different. Clang emits LLVM IR with almost no optimizations and MLIR
>> already supports considerable amount of optimizations.
>>
>> Decision of using the OpenMPIRBuilder for MLIR was discussed in the
>> following flang-dev threads (Please correct me If I am missing some newer
>> discussions on the below topics)
>>
>>
>>    1.
>>
>>    [May 2019] h
>>    ttp://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
>>    <http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html>
>>    2.
>>
>>    [June 2019]
>>    http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-June/000251.html
>>
>>
>> However I could not find any conclusions for the concerns raised by Kiran:
>>
>>
>>    1.
>>
>>    Early outlining  (in MLIR) vs. Late outlining (in LLVM)
>>    2.
>>
>>    Handling of target constructions: high-level transformations for GPUs
>>    and  CPUs (offloading in LLVM vs. MLIR?)
>>
>>
>> Kiran seems to suggest the early outlining (version 2) would be better(
>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000224.html
>> ). But currently, the late outlining has been implemented in LLVM
>> (version 1) (
>> https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
>> ).
>>
>> Early outlining in MLIR would have the following benefits as suggested in
>> the thread:
>>
>>
>>
>>    1.
>>
>>    Enables more optimization in MLIR (intra-procedural because of
>>    regions).
>>    2.
>>
>>    Offloading in MLIR (which is designed for heterogenous hardware
>>    compilation support)
>>    3.
>>
>>    Direct LLVM Dialect lowering of OpenMP operations (no LLVM IR
>>    lowering)
>>
>>
>> MLIR google groups discussion (*h*
>> ttps://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
>> <https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw>)
>> regarding the use of OpenMPIRBuilder doesn’t seem to discuss the above
>> concerns and also about how the various design decisions in OpenMPIRBuilder
>> affects MLIR in general.
>>
>> Also,
>>
>>  > “The point here is that we do not want to use MLIR just as a
>> pass-through layer because MLIR has a lot of strengths”
>>
>>   ....
>>
>>   > “The point here is that if we lower to LLVM dialect, we will not be
>> able to reuse OpenMP codegen & optimisation code from Clang/LLVM.”
>>
>>      --- by Kiran in
>> https://lists.llvm.org/pipermail/llvm-dev/2020-February/139181.html
>>
>> In the latest reply from Kiran (quoted above) to this thread, Kiran seems
>> to suggest that lowering to LLVM Dialect (instead of LLVM IR) would
>> restrict the use of OpenMP Optimization code from LLVM and also MLIR will
>> just be a pass-through to the OpenMPIRBuilder.
>>
>
>> Because of the above reasons, it seems to me that design considerations
>> of using OpenMPIRBuilder for MLIR should also be mentioned (and discussed)
>> before commiting LLVM IR lowering part for OpenMP dialect in
>> https://reviews.llvm.org/D72962
>>
> Mehdi also seems to have the same suggestion: “I agree that having
>>>> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962
>>>>
>>>
>>> Since you're calling me out: yes it would be cleaner from a pure MLIR
>>> point of view, I don't think there is much disagreement on this (I think?).
>>> However we already have the OpenMP builders available and they will
>>> continue to be maintained/evolved to support OpenMP in clang.
>>> Duplicating them entirely in MLIR for the sake of purity seems like a
>>> lack of pragmatism here, so I support the current approach with the current
>>> tradeoffs.
>>>
>>>
>>>>
>>>> > Yes, the design has mildly changed over time to incorporate
>>>> feedback.  But the latest is what is there in the RFC in discourse.
>>>>
>>>> RFC fails to discuss the following (I have also mentioned some of them
>>>> in my reply to Johannes):
>>>>
>>>> > The proposed plan involves a) lowering F18 AST with OpenMP directly
>>>> to a mix of OpenMP and FIR dialects. b) converting this finally to a mix of
>>>> OpenMP and LLVM dialects.
>>>>
>>>> It is unclear in the RFC what other dialects are considered as
>>>> supported for OpenMP dialect  (std, affine, vector, loop, etc) and how it
>>>> would be transformed, used and lowered from FIR to LLVM.
>>>>
>>>> It becomes important to list down the various dialects / operations /
>>>> types supported for OpenMP (which is mainly defined for C, C++ and Fortran
>>>> programs. MLIR has a much wider scope.
>>>>
>>>> It wouldn’t add much value for the proposed OpenMP dialect to be in the
>>>> MLIR tree if it cannot support at least the relevant standard dialect types
>>>> / operations.
>>>>
>>>
>>> I agree, and I think this was something I called out as important in the
>>> RFC: "It seems that the dialect can be orthogonal to FIR and its type
>>> system, which the most important thing to me to integrate MLIR (favor
>>> reusability across other frontends / compiler frameworks)".
>>> If you don't think that this is the case, then please raise this in the
>>> RFC!
>>> I think it is perfectly fair to ask for more examples from the author
>>> and digging a bit deeper if you're unconvinced that the proposed modeling
>>> can be applicable outside of FIR. This is exactly why we ask such proposal
>>> to go through RFC by the way: to allow people like you to point at the
>>> blindspot and ask the right questions.
>>>
>>> Best,
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>> > We would like to take advantage of the transformations in cases that
>>>> are possible. FIR loops will be converted to affine/loop dialect. So the
>>>> loop inside an omp.do can be in these dialects as clarified in the
>>>> discussion in discourse and also shown in slide 20 of the fosdem
>>>> presentation (links to both below).
>>>>
>>>>
>>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
>>>>
>>>>
>>>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf
>>>>
>>>> Although it is mentioned that the affine/ loop.for is used, following
>>>> things are unclear:
>>>>
>>>> I am assuming that there will be lowering / conversion code in f18 repo
>>>> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I
>>>> think it is worth mentioning it in the “sequential code flow
>>>> representation” in the RFC.
>>>>
>>>> This raises the following questions.
>>>>
>>>>
>>>>
>>>>    1.
>>>>
>>>>    Which types are supported? Standard dialect types and FIR types?
>>>>
>>>>
>>>> For example, what types are used for Fortran arrays used inside OpenMP
>>>> regions? Is it std.memref OR Fortran array representation in FIR dialect
>>>> (fir.array?) OR both?  Note that Fortran has support for column major
>>>> arrays. std.memref supports custom memory layouts. What custom layouts are
>>>> supported?
>>>>
>>>>
>>>> How would different non-scalar types in standard dialect  be lowered to
>>>> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on
>>>> this?
>>>>
>>>> The example provided in slide 20 of the fosdem presentation contains
>>>>
>>>> “loop.for %j = %lb2 to %ub2 : !integer {“
>>>>
>>>> But loop.for accepts “index” type. Not sure what type “!integer”
>>>> represents here.
>>>>
>>>>
>>>>    1.
>>>>
>>>>    What are the different memory access operations which are supported
>>>>    inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM
>>>>    IR?
>>>>
>>>>
>>>> The possibilities are:
>>>>
>>>>    1.
>>>>
>>>>    affine.load / affine.store
>>>>    2.
>>>>
>>>>    std.load / std.store
>>>>    3.
>>>>
>>>>    FIR dialect memory access operations.
>>>>
>>>>
>>>> > I must also point out that the question of where to do loop
>>>> transformations is a topic we have not fully converged on. See the
>>>> following thread for discussions.
>>>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html
>>>>
>>>> Looks like placement (MLIR / LLVM) of various transformations related
>>>> to OpenMP has not been finalized, from what I could infer from Johannes’s
>>>> reply and the below text in the latest RFC in discourse:
>>>>
>>>> “So there exist some questions regarding where the optimisations
>>>> should be carried out. We will decide on which framework to choose only
>>>> after some experimentation.”
>>>>
>>>> > i) we need to keep the loops separately so as to take advantage of
>>>> transformations that other dialects like affine/loop would provide.
>>>>
>>>> 1) Keeping the loops separate from the OpenMP operations will expose
>>>> them to the “regular” transformations passes in MLIR inside the OpenMP
>>>> region. Most of them are invalid or in-efficient for OpenMP operations.
>>>>
>>>> Examples:
>>>>
>>>>    1.
>>>>
>>>>    Constant propagation example mentioned by Johannes in this thread.
>>>>    (omp task shared(x))
>>>>    2.
>>>>
>>>>    Loop (nest) transformations (permute / split / fuse / tile, etc)
>>>>    will happen ignoring the surrounding OpenMP operations.
>>>>    3.
>>>>
>>>>    Hoisting and sinking of various memory/ SSA values inside the
>>>>    OpenMP region. This goes against the likes of “map”, “firstprivate”,
>>>>    shared, etc clauses and more.
>>>>
>>>>
>>>> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will
>>>> have) different transformations/ optimization passes which are different
>>>> from one another.
>>>>
>>>> Example:
>>>>
>>>>    1.
>>>>
>>>>    AffineLoopInvariantCodeMotion.cpp is different from
>>>>    LoopInvariantCodeMotion.cpp.
>>>>    2.
>>>>
>>>>    Other Loop transformation passes for affine.for
>>>>
>>>>
>>>> These loops also use different Types and memory access operations in
>>>> general for transformations. Example, most Affine dialect transformations
>>>> (if not all) work on affine.load and affine.store operations.
>>>>
>>>> Supporting different loop operations means that there would be *OpenMP
>>>> specific transformations* for each one of them and also requires a way to
>>>> restrict each of them from existing transformations (when nested in OpenMP
>>>> constructs).
>>>>
>>>> There would be different lowerings for different loop operations as
>>>> well. Example, affine.for and loop.for would have to be lowered to omp.do
>>>> in different ways.
>>>>
>>>> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP
>>>> dialect representation is as follows:
>>>>
>>>> ------------------------------
>>>>
>>>> Mlir.region(…) {
>>>>
>>>>    omp.parallel  {
>>>>
>>>>      %ub3 = …
>>>>
>>>>      omp.do %i = 0 to %ub3 : !integer  {
>>>>
>>>>      …
>>>>
>>>>      }
>>>>
>>>>   }
>>>>
>>>> }
>>>>
>>>> -------------------------------
>>>>
>>>> Currently, the LLVM Dialect doesn’t contain a high level loop
>>>> operation. It is all based on CFG implementation.
>>>>
>>>> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR
>>>> there would be CFG for LLVM Dialect based loop operation?
>>>>
>>>> Can you please mention how the OpenMP + LLVM dialect will look like for
>>>> the below parallel do construct?
>>>>
>>>> integer :: i=1, k=10
>>>>
>>>> integer :: a(10), b(10), c(10)
>>>>
>>>> ...
>>>>
>>>>  !$omp parallel do
>>>>
>>>>   do i = 1, k
>>>>
>>>>     if (i .ne. 1) *cycle*
>>>>
>>>>     c(i) = a(i) + b(i)
>>>>
>>>>   end do
>>>>
>>>>   !$omp end parallel do
>>>>
>>>> print *,c
>>>>
>>>
>>>> Thanks,
>>>>
>>>> Vinay
>>>>
>>>> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello Vinay,
>>>>>
>>>>> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know
>>>>> that you and several other groups are interested in the OpenMP dialect. At
>>>>> the outset, I must point out that the design is not set in stone and will
>>>>> change as we make progress. You are welcome to participate, provide
>>>>> feedback and criticism to change the design as well as to contribute to the
>>>>> implementation. I provide some clarifications and replies to your comments
>>>>> below. If it is OK we can have further discussions in discourse as River
>>>>> points out.
>>>>>
>>>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and
>>>>> clang frontends. Note that this proposal was before considering MLIR for
>>>>> FIR.
>>>>>
>>>>> A correction here. The proposal for OpenMPIRBuilder was made when MLIR
>>>>> was being considered for FIR.
>>>>> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions
>>>>> considering MLIR for FIR.
>>>>>
>>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html
>>>>> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR.
>>>>>
>>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html
>>>>>
>>>>> b. Review of barrier construct is in progress:
>>>>> https://reviews.llvm.org/D72962
>>>>>
>>>>> Minor correction here. The addition of barrier construct was accepted
>>>>> and has landed (https://reviews.llvm.org/D7240
>>>>> <https://reviews.llvm.org/D72400>). It is the review for translation
>>>>> to LLVM IR that is in progress.
>>>>>
>>>>> It looks like the design has evolved over time and there is no one
>>>>> place which contains the latest design decisions that fits all the
>>>>> different pieces of the puzzle. I will try to deduce it from the above
>>>>> mentioned references. Please correct me If I am referring to anything which
>>>>> has changed.
>>>>>
>>>>> Yes, the design has mildly changed over time to incorporate feedback.
>>>>> But the latest is what is there in the RFC in discourse.
>>>>>
>>>>> For most OpenMP design discussions, FIR examples are used (as seen in
>>>>> (2) and (3)). The MLIR examples mentioned in the design only talks about
>>>>> FIR dialect and LLVM dialect.
>>>>>
>>>>> Our initial concern was how will all these pieces (FIR, LLVM Dialect,
>>>>> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR
>>>>> and LLVM dialect and more information about lowering/translation than
>>>>> transformations/optimisations.
>>>>>
>>>>> This completely ignores the likes of standard, affine (where most loop
>>>>> transformations are supposed to happen) and loop dialects.
>>>>>
>>>>> Adding to the reply above. We would like to take advantage of the
>>>>> transformations in cases that are possible. FIR loops will be converted to
>>>>> affine/loop dialect. So the loop inside an omp.do can be in these dialects
>>>>> as clarified in the discussion in discourse and also shown in slide 20 of
>>>>> the fosdem presentation (links to both below).
>>>>>
>>>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
>>>>>
>>>>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf
>>>>>
>>>>> I must also point out that the question of where to do loop
>>>>> transformations is a topic we have not fully converged on. See the
>>>>> following thread for discussions.
>>>>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html
>>>>>
>>>>> Is it the same omp.do operation which now contains the bounds and
>>>>> induction variables of the loop after the LLVM conversion?
>>>>>
>>>>> The point here is that i) we need to keep the loops separately so as
>>>>> to take advantage of transformations that other dialects like affine/loop
>>>>> would provide. ii) We will need the loop information while lowering the
>>>>> OpenMP do operation. For implementation, if reusing the same operation (in
>>>>> different contexts) is difficult then we can add a new operation.
>>>>>
>>>>> It is also not mentioned how clauses like firstprivate, shared,
>>>>> private, reduce, map, etc are lowered to OpenMP dialect.
>>>>>
>>>>> Yes, it is not mentioned. We did a study of a few constructs and
>>>>> clauses which was shared as mails to flang-dev and the RFC. As we make
>>>>> progress and before implementation, we will share further details.
>>>>>
>>>>> it would be beneficial to have an omp.parallel_do operation which has
>>>>> semantics similar to other loop structures (may not be LoopLikeInterface)
>>>>> in MLIR.
>>>>>
>>>>> I am not against adding parallel_do if it can help with
>>>>> transformations or reduce the complexity of lowering. Please share the
>>>>> details in discourse as a reply to the RFC or a separate thread.
>>>>>
>>>>> it looks like having OpenMP operations based on standard MLIR types
>>>>> and operations (scalars and memrefs mainly) is the right way to go.
>>>>>
>>>>> This will definitely be the first version that we implement. But I do
>>>>> not understand why we should restrict to only the standard types and
>>>>> operations. To ease lowering and translation and to avoid adding OpenMP
>>>>> operations to other dialects, I believe OpenMP dialect should also be able
>>>>> to exist with other dialects like FIR and LLVM.
>>>>>
>>>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies
>>>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides.
>>>>>
>>>>> Also, OpenMP codegen will automatically benefit from the GPU dialect
>>>>> based optimizations. For example, it would be way easier to hoist a memory
>>>>> reference out of GPU kernel in MLIR than in LLVM IR.
>>>>>
>>>>> I might not have fully understood you here. But the dialect lives
>>>>> independently of the translation to LLVM IR. If there are optimisations
>>>>> (like hoisting that you mention here) I believe they can be performed as
>>>>> transformation passes on the dialect. It is not ruled out.
>>>>>
>>>>> --Kiran
>>>>> ------------------------------
>>>>> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of
>>>>> Vinay Madhusudan via flang-dev <flang-dev at lists.llvm.org>
>>>>> *Sent:* 13 February 2020 16:33
>>>>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>;
>>>>> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org>
>>>>> *Subject:* [flang-dev] About OpenMP dialect in MLIR
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have few questions / concerns regarding the design of OpenMP dialect
>>>>> in MLIR that is currently being implemented, mainly for the f18 compiler.
>>>>> Below, I summarize the current state of various efforts in clang / f18 /
>>>>> MLIR / LLVM regarding this. Feel free to add to the list in case I have
>>>>> missed something.
>>>>>
>>>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and
>>>>> clang frontends. Note that this proposal was before considering MLIR for
>>>>> FIR.
>>>>>
>>>>> a. llvm-dev proposal :
>>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
>>>>>
>>>>> b. Patches in review: https://reviews.llvm.org/D70290. This also
>>>>> includes the clang codegen changes.
>>>>>
>>>>> 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
>>>>> proposed with respect to the f18 compilation stack (keeping FIR in mind).
>>>>>
>>>>> a. flang-dev discussion link:
>>>>> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
>>>>>
>>>>> b. Design decisions captured in PPT:
>>>>> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
>>>>>
>>>>> c. MLIR google groups discussion:
>>>>> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
>>>>>
>>>>> d. Target constructs  design:
>>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
>>>>>
>>>>> e. SIMD constructs design:
>>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
>>>>>
>>>>> 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
>>>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
>>>>>
>>>>> 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
>>>>>
>>>>> a. The first patch which introduces the OpenMP dialect was pushed.
>>>>>
>>>>> b. Review of barrier construct is in progress:
>>>>> https://reviews.llvm.org/D72962
>>>>> https://reviews.llvm.org/D72400
>>>>>
>>>>> I have tried to list below different topics of interest (to different
>>>>> people) around this work. Most of these are in the design phase (or very
>>>>> new) and multiple parties are interested with different sets of goals in
>>>>> mind.
>>>>>
>>>>> I.  Flang frontend and its integration
>>>>>
>>>>> II. Fortran representation in MLIR / FIR development
>>>>>
>>>>> III. OpenMP development for flang,  OpenMP builder in LLVM.
>>>>>
>>>>> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
>>>>>
>>>>> It looks like the design has evolved over time and there is no one
>>>>> place which contains the latest design decisions that fits all the
>>>>> different pieces of the puzzle. I will try to deduce it from the above
>>>>> mentioned references. Please correct me If I am referring to anything which
>>>>> has changed.
>>>>>
>>>>> A. For most OpenMP design discussions, FIR examples are used (as seen
>>>>> in (2) and (3)). The MLIR examples mentioned in the design only talks about
>>>>> FIR dialect and LLVM dialect.
>>>>>
>>>>> This completely ignores the likes of standard, affine (where most loop
>>>>> transformations are supposed to happen) and loop dialects. I think it is
>>>>> critical to decouple the OpenMP dialect development in MLIR from the
>>>>> current flang / FIR effort. It would be useful if someone can mention these
>>>>> examples using existing dialects in MLIR and also how the different
>>>>> transformations / lowerings are planned.
>>>>>
>>>>> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
>>>>> version will be as follows,
>>>>>
>>>>>   omp.parallel {
>>>>>
>>>>>     omp.do {
>>>>>
>>>>>        fir.do %i = 0 to %ub3 : !fir.integer {
>>>>>
>>>>>         ...
>>>>>
>>>>>        }
>>>>>
>>>>>     }
>>>>>
>>>>>   }
>>>>>
>>>>> and then after the "LLVM conversion" it is converted as follows:
>>>>>
>>>>>   omp.parallel {
>>>>>
>>>>>     %ub3 =
>>>>>
>>>>>     omp.do %i = 0 to %ub3 : !llvm.integer {
>>>>>
>>>>>     ...
>>>>>
>>>>>     }
>>>>>
>>>>>   }
>>>>>
>>>>>
>>>>> a. Is it the same omp.do operation which now contains the bounds and
>>>>> induction variables of the loop after the LLVM conversion? If so, will the
>>>>> same operation have two different semantics during a single compilation?
>>>>>
>>>>> b. Will there be different lowerings for various loop operations from
>>>>> different dialects? loop.for and affine.for under omp operations would need
>>>>> different OpenMP / LLVM lowerings. Currently, both of them are
>>>>> lowered to the CFG based loops during the LLVM dialect conversion (which is
>>>>> much before the proposed OpenMP dialect lowering).
>>>>>
>>>>> There would be no standard way to represent OpenMP operations
>>>>> (especially the ones which involve loops) in MLIR. This would drastically
>>>>> complicate lowering.
>>>>>
>>>>> C. It is also not mentioned how clauses like firstprivate, shared,
>>>>> private, reduce, map, etc are lowered to OpenMP dialect. The example
>>>>> in the RFC contains FIR and LLVM types and nothing about std dialect types.
>>>>> Consider the below example:
>>>>>
>>>>> #pragma omp parallel for reduction(+:x)
>>>>>
>>>>> for (int i = 0; i < N; ++i)
>>>>>
>>>>>   x += a[i];
>>>>>
>>>>> How would the above be represented in OpenMP dialect? and What type
>>>>> would "x" be in MLIR?  It is not mentioned in the design as to how the
>>>>> various SSA values for various OpenMP clauses are passed around in OpenMP
>>>>> operations.
>>>>>
>>>>> D. Because of (A), (B) and (C), it would be beneficial to have an omp.
>>>>> parallel_do operation which has semantics similar to other loop
>>>>> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like
>>>>> having OpenMP operations based on standard MLIR types and operations
>>>>> (scalars and memrefs mainly) is the right way to go.
>>>>>
>>>>> Why not have omp.parallel_do operation with AffineMap based bounds, so
>>>>> as to decouple it from Value/Type similar to affine.for?
>>>>>
>>>>> 1. With the current design, the number of transformations /
>>>>> optimizations that one can write on OpenMP constructs would become limited
>>>>> as there can be any custom loop structure with custom operations / types
>>>>> inside it.
>>>>>
>>>>> 2. It would also be easier to transform the Loop nests containing
>>>>> OpenMP constructs if the body of the OpenMP operations is well defined
>>>>> (i.e., does not accept arbitrary loop structures). Having nested redundant
>>>>> "parallel" , "target" and "do" regions seems unnecessary.
>>>>>
>>>>> 3. There would also be new sets of loop structures in new dialects
>>>>> when C/C++ is compiled to MLIR. It would complicate the number of possible
>>>>> combinations inside the OpenMP region.
>>>>>
>>>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies
>>>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides.
>>>>> Being able to compile the code for heterogeneous hardware is one of the
>>>>> biggest advantages that MLIR brings to the table. That is being completely
>>>>> missed here. This also requires solving the problem of handling target
>>>>> information in MLIR. But that is a problem which needs to be solved anyway.
>>>>> Using GPU dialect also gives us an opportunity to represent offloading
>>>>> semantics in MLIR.
>>>>>
>>>>> Given the ability to represent multiple ModuleOps and the existence of
>>>>> GPU dialect, couldn't higher level optimizations on offloaded code be done
>>>>> at MLIR level?. The proposed design would lead us to the same problems that
>>>>> we are currently facing in LLVM IR.
>>>>>
>>>>> Also, OpenMP codegen will automatically benefit from the GPU dialect
>>>>> based optimizations. For example, it would be way easier to hoist a memory
>>>>> reference out of GPU kernel in MLIR than in LLVM IR.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vinay
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/a624af81/attachment-0001.html>