[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

River Riddle via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 13 11:49:39 PST 2020


On Thu, Feb 13, 2020 at 10:18 AM Johannes Doerfert via flang-dev <
flang-dev at lists.llvm.org> wrote:

> Hi Vinay,
>
> Thanks for taking an interest and the detailed discussion.
>
> To start by picking a few paragraph from your email to clarify a couple
> of things that lead to the current design or that might otherwise need
> clarification. We can talk about other points later as well.
>
> [
>   Site notes:
>     1) I'm not an MLIR person.
>     2) It seems unfortnuate that we do not have a mlir-dev list.


MLIR uses discourse, llvm.discourse.group.



> ]
>
>
> > 1. With the current design, the number of transformations / optimizations
> > that one can write on OpenMP constructs would become limited as there can
> > be any custom loop structure with custom operations / types inside it.
>
> OpenMP, as an input language, does not make many assumptions about the
> code inside of constructs*. So, inside a parallel can be almost anything
> the base language has to offer, both lexically and dynamically.
> Assuming otherwise is not going to work. Analyzing a "generic" OpenMP
> representation in order to determine if can be represented as a more
> restricted "op" seems at least plausible. You will run into various
> issue, some mentioned explicitly below. For starters, you still have to
> generate proper OpenMP runtime calls, e.g., from your GPU dialect, even
> if it is "just" to make sure the OMPD/OMPT interfaces expose useful
> information.
>
>
> * I preclude the `omp loop` construct here as it is not even implemented
>   anywhere as far as I know.
>
>
> > 2. It would also be easier to transform the Loop nests containing OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
>
> As mentioned above, you cannot start with the assumption OpenMP input is
> structured this this way. You have to analyze it first. This is the same
> reason we cannot simply transform C/C++ `for loops` into `affine.for`
> without proper analysis of the loop body.
>
> Now, more concrete. Nested parallel and target regions are not
> necessarily redundant, nor can/should we require the user not to have
> them. Nested parallelism can easily make sense, depending on the problem
> decomposition. Nested target will make a lot of sense with reverse
> offload, which is already in the standard, and it also should be allowed
> for the sake of a modular (user) code base.
>
>
> > 3. There would also be new sets of loop structures in new dialects when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
>
> Is anyone working on this? If so, what is the timeline? I personally was
> not expecting Clang to switch over to MLIR any time soon but I am happy
> if someone wants to correct me on this. I mention this only because it
> interacts with the arguments I will make below.
>
>
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> > able to compile the code for heterogeneous hardware is one of the biggest
> > advantages that MLIR brings to the table. That is being completely missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> > dialect also gives us an opportunity to represent offloading semantics in
> > MLIR.
>
> I'm unsure what the problem with "handling target information in MLIR" is
> but
> whatever design we end up with, we need to know about the target
> (triple) in all stages of the pipeline, even if it is just to pass it
> down.
>
>
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be done at
> > MLIR level?. The proposed design would lead us to the same problems that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
>
> While I agree with the premise that you can potentially reuse MLIR
> transformations, it might not be as simple in practice.
>
> As mentioned above, you cannot assume much about OpenMP codes, almost
> nothing for a lot of application codes I have seen. Some examples:
>
> If you have a function call, or any synchronization event for that
> matter, located between two otherwise adjacent target regions (see
> below), you cannot assume the two target regions will be offloaded to
> the same device.
> ```
>   #omp target
>   {}
>   foo();
>   #omp target
>   {}
> ```
> Similarly, you cannot assume a `omp parallel` is allowed to be executed
> with more than a single thread, or that a `omp [parallel] for` does not
> have loop carried data-dependences, ...
> Data-sharing attributes are also something that has to be treated
> carefully:
> ```
> x = 5;
> #omp task
>   x = 3;
> print(x);
> ```
> Should print 5, not 3.
>
> I hope I convinced you that OpenMP is not trivially mappable to existing
> dialects without proper analysis. If not, please let me know why you
> expect it to be.
>
>
> Now when it comes to code analyses, LLVM-IR offers a variety of
> interesting features, ranging from a mature set of passes to the
> cross-language LTO capabilities. We are working on the missing parts,
> e.g., heterogeneous llvm::Modules as we speak. Simple OpenMP
> optimizations are already present in LLVM and interesting ones are
> prototyped for a while now (let me know if you want to see more not-yet
> merged patches/optimizations). I also have papers, results, and
> talks that might be interesting here. Let me know if you need pointers
> to them.
>
>
> Cheers,
>   Johannes
>
>
>
> On 02/13, Vinay Madhusudan via llvm-dev wrote:
> > Hi,
> >
> > I have few questions / concerns regarding the design of OpenMP dialect in
> > MLIR that is currently being implemented, mainly for the f18 compiler.
> > Below, I summarize the current state of various efforts in clang / f18 /
> > MLIR / LLVM regarding this. Feel free to add to the list in case I have
> > missed something.
> >
> > 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> > frontends. Note that this proposal was before considering MLIR for FIR.
> >
> > a. llvm-dev proposal :
> >
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
> >
> > b. Patches in review: https://reviews.llvm.org/D70290. This also
> includes
> > the clang codegen changes.
> >
> > 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> > proposed with respect to the f18 compilation stack (keeping FIR in mind).
> >
> > a. flang-dev discussion link:
> > https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
> >
> > b. Design decisions captured in PPT:
> > https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
> >
> > c. MLIR google groups discussion:
> >
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
> >
> > d. Target constructs  design:
> >
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
> >
> > e. SIMD constructs design:
> >
> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
> >
> > 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> > https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
> >
> > 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
> >
> > a. The first patch which introduces the OpenMP dialect was pushed.
> >
> > b. Review of barrier construct is in progress:
> > https://reviews.llvm.org/D72962
> >
> > I have tried to list below different topics of interest (to different
> > people) around this work. Most of these are in the design phase (or very
> > new) and multiple parties are interested with different sets of goals in
> > mind.
> >
> > I.  Flang frontend and its integration
> >
> > II. Fortran representation in MLIR / FIR development
> >
> > III. OpenMP development for flang,  OpenMP builder in LLVM.
> >
> > IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
> >
> > It looks like the design has evolved over time and there is no one place
> > which contains the latest design decisions that fits all the different
> > pieces of the puzzle. I will try to deduce it from the above mentioned
> > references. Please correct me If I am referring to anything which has
> > changed.
> >
> > A. For most OpenMP design discussions, FIR examples are used (as seen in
> > (2) and (3)). The MLIR examples mentioned in the design only talks about
> > FIR dialect and LLVM dialect.
> >
> > This completely ignores the likes of standard, affine (where most loop
> > transformations are supposed to happen) and loop dialects. I think it is
> > critical to decouple the OpenMP dialect development in MLIR from the
> > current flang / FIR effort. It would be useful if someone can mention
> these
> > examples using existing dialects in MLIR and also how the different
> > transformations / lowerings are planned.
> >
> > B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> > version will be as follows,
> >
> >   omp.parallel {
> >
> >     omp.do {
> >
> >        fir.do %i = 0 to %ub3 : !fir.integer {
> >
> >         ...
> >
> >        }
> >
> >     }
> >
> >   }
> >
> > and then after the "LLVM conversion" it is converted as follows:
> >
> >   omp.parallel {
> >
> >     %ub3 =
> >
> >     omp.do %i = 0 to %ub3 : !llvm.integer {
> >
> >     ...
> >
> >     }
> >
> >   }
> >
> >
> > a. Is it the same omp.do operation which now contains the bounds and
> > induction variables of the loop after the LLVM conversion? If so, will
> the
> > same operation have two different semantics during a single compilation?
> >
> > b. Will there be different lowerings for various loop operations from
> > different dialects? loop.for and affine.for under omp operations would
> need
> > different OpenMP / LLVM lowerings. Currently, both of them are lowered to
> > the CFG based loops during the LLVM dialect conversion (which is much
> > before the proposed OpenMP dialect lowering).
> >
> > There would be no standard way to represent OpenMP operations (especially
> > the ones which involve loops) in MLIR. This would drastically complicate
> > lowering.
> >
> > C. It is also not mentioned how clauses like firstprivate, shared,
> private,
> > reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
> > contains FIR and LLVM types and nothing about std dialect types. Consider
> > the below example:
> >
> > #pragma omp parallel for reduction(+:x)
> >
> > for (int i = 0; i < N; ++i)
> >
> >   x += a[i];
> >
> > How would the above be represented in OpenMP dialect? and What type would
> > "x" be in MLIR?  It is not mentioned in the design as to how the various
> > SSA values for various OpenMP clauses are passed around in OpenMP
> > operations.
> >
> > D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> > parallel_do operation which has semantics similar to other loop
> structures
> > (may not be LoopLikeInterface) in MLIR. To me, it looks like having
> OpenMP
> > operations based on standard MLIR types and operations (scalars and
> memrefs
> > mainly) is the right way to go.
> >
> > Why not have omp.parallel_do operation with AffineMap based bounds, so as
> > to decouple it from Value/Type similar to affine.for?
> >
> > 1. With the current design, the number of transformations / optimizations
> > that one can write on OpenMP constructs would become limited as there can
> > be any custom loop structure with custom operations / types inside it.
> >
> > 2. It would also be easier to transform the Loop nests containing OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
> >
> > 3. There would also be new sets of loop structures in new dialects when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
> >
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> > able to compile the code for heterogeneous hardware is one of the biggest
> > advantages that MLIR brings to the table. That is being completely missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> > dialect also gives us an opportunity to represent offloading semantics in
> > MLIR.
> >
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be done at
> > MLIR level?. The proposed design would lead us to the same problems that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
> _______________________________________________
> flang-dev mailing list
> flang-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev
>
-- 
Thank you,
River Riddle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200213/c86712e3/attachment.html>


More information about the llvm-dev mailing list