[flang-dev] About OpenMP dialect in MLIR

Thu Feb 13 08:33:40 PST 2020

Hi,

I have few questions / concerns regarding the design of OpenMP dialect in
MLIR that is currently being implemented, mainly for the f18 compiler.
Below, I summarize the current state of various efforts in clang / f18 /
MLIR / LLVM regarding this. Feel free to add to the list in case I have
missed something.

1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
frontends. Note that this proposal was before considering MLIR for FIR.

a. llvm-dev proposal :
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

b. Patches in review: https://reviews.llvm.org/D70290. This also includes
the clang codegen changes.

2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
proposed with respect to the f18 compilation stack (keeping FIR in mind).

a. flang-dev discussion link:
https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html

b. Design decisions captured in PPT:
https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view

c. MLIR google groups discussion:
https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

d. Target constructs  design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

e. SIMD constructs design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html

3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397

4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:

a. The first patch which introduces the OpenMP dialect was pushed.

b. Review of barrier construct is in progress:
https://reviews.llvm.org/D72962

I have tried to list below different topics of interest (to different
people) around this work. Most of these are in the design phase (or very
new) and multiple parties are interested with different sets of goals in
mind.

I.  Flang frontend and its integration

II. Fortran representation in MLIR / FIR development

III. OpenMP development for flang,  OpenMP builder in LLVM.

IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.

It looks like the design has evolved over time and there is no one place
which contains the latest design decisions that fits all the different
pieces of the puzzle. I will try to deduce it from the above mentioned
references. Please correct me If I am referring to anything which has
changed.

A. For most OpenMP design discussions, FIR examples are used (as seen in
(2) and (3)). The MLIR examples mentioned in the design only talks about
FIR dialect and LLVM dialect.

This completely ignores the likes of standard, affine (where most loop
transformations are supposed to happen) and loop dialects. I think it is
critical to decouple the OpenMP dialect development in MLIR from the
current flang / FIR effort. It would be useful if someone can mention these
examples using existing dialects in MLIR and also how the different
transformations / lowerings are planned.

B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
version will be as follows,

  omp.parallel {

    omp.do {

       fir.do %i = 0 to %ub3 : !fir.integer {

        ...

       }

    }

  }

and then after the "LLVM conversion" it is converted as follows:

  omp.parallel {

    %ub3 =

    omp.do %i = 0 to %ub3 : !llvm.integer {

    ...

    }

  }

a. Is it the same omp.do operation which now contains the bounds and
induction variables of the loop after the LLVM conversion? If so, will the
same operation have two different semantics during a single compilation?

b. Will there be different lowerings for various loop operations from
different dialects? loop.for and affine.for under omp operations would need
different OpenMP / LLVM lowerings. Currently, both of them are lowered to
the CFG based loops during the LLVM dialect conversion (which is much
before the proposed OpenMP dialect lowering).

There would be no standard way to represent OpenMP operations (especially
the ones which involve loops) in MLIR. This would drastically complicate
lowering.

C. It is also not mentioned how clauses like firstprivate, shared, private,
reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
contains FIR and LLVM types and nothing about std dialect types. Consider
the below example:

#pragma omp parallel for reduction(+:x)

for (int i = 0; i < N; ++i)

  x += a[i];

How would the above be represented in OpenMP dialect? and What type would
"x" be in MLIR?  It is not mentioned in the design as to how the various
SSA values for various OpenMP clauses are passed around in OpenMP
operations.

D. Because of (A), (B) and (C), it would be beneficial to have an omp.
parallel_do operation which has semantics similar to other loop structures
(may not be LoopLikeInterface) in MLIR. To me, it looks like having OpenMP
operations based on standard MLIR types and operations (scalars and memrefs
mainly) is the right way to go.

Why not have omp.parallel_do operation with AffineMap based bounds, so as
to decouple it from Value/Type similar to affine.for?

1. With the current design, the number of transformations / optimizations
that one can write on OpenMP constructs would become limited as there can
be any custom loop structure with custom operations / types inside it.

2. It would also be easier to transform the Loop nests containing OpenMP
constructs if the body of the OpenMP operations is well defined (i.e., does
not accept arbitrary loop structures). Having nested redundant "parallel" ,
"target" and "do" regions seems unnecessary.

3. There would also be new sets of loop structures in new dialects when
C/C++ is compiled to MLIR. It would complicate the number of possible
combinations inside the OpenMP region.

E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
able to compile the code for heterogeneous hardware is one of the biggest
advantages that MLIR brings to the table. That is being completely missed
here. This also requires solving the problem of handling target information
in MLIR. But that is a problem which needs to be solved anyway. Using GPU
dialect also gives us an opportunity to represent offloading semantics in
MLIR.

Given the ability to represent multiple ModuleOps and the existence of GPU
dialect, couldn't higher level optimizations on offloaded code be done at
MLIR level?. The proposed design would lead us to the same problems that we
are currently facing in LLVM IR.

Also, OpenMP codegen will automatically benefit from the GPU dialect based
optimizations. For example, it would be way easier to hoist a memory
reference out of GPU kernel in MLIR than in LLVM IR.

Thanks,

Vinay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/flang-dev/attachments/20200213/d3858320/attachment-0001.html>