[llvm-dev] [RFC] Abstract Parallel IR Optimizations

Tue Jun 12 07:56:25 PDT 2018

Hi Johannes,

thanks a lot, all clear now!

Kind regards,
Roger

2018-06-12 16:23 GMT+02:00 Johannes Doerfert <jdoerfert at anl.gov>:
> Hi Roger,
>
> On 06/12, Roger Ferrer Ibáñez wrote:
>> apologies in advance if the questions following are silly or don't
>> make sense. I lack a bit of context here and I'm not sure to fully
>> understand your proposal.
>
> No worries, I'm glad if people ask questions!
>
>> Currently clang (and flang) are lowering OpenMP when building LLVM IR
>> (this is because LLVM IR can't express the parallel/concurrent
>> concepts of OpenMP so they have to be lowered first). So, can I assume
>> that your proposal starts off in a context where that lowering is not
>> happening anymore in the front end but it'd happen later in a LLVM IR
>> pass? If so, then you'd be assuming that there is already a way of
>> representing OpenMP constructs in the LLVM IR, is my understanding
>> correct here? I think that the Intel proposal [1] could be one way
>> (not necessarily the one) to do this (disregarding the fact that it is
>> tailored for OpenMP), does this still make sense?
>
> My proposal does _not_ assume we change clang in any way, though it does
> also not require it. However, the initial patch [1] will only work with
> the OpenMP lowering used by clang right now.
>
> The idea is as follows:
>
>   We have different representation of parallelism in the IR, for example
>   the KMP runtime library calls emitted by clang or the Intel parallel
>   IR you mentioned. For each of them we write a piece of code that (1)
>   extracts domain specific information and (2) allows to modify the
>   parallel representation. This is the only piece of code that has to be
>   adapted for each parallel representation we want to optimize. On top
>   of this are abstract interfaces that expose the information and
>   modification options to parallel optimization passes. The patch [1]
>   only contains the attribute annotator but we have more as explained in
>   the paper [0]. The analysis/optimization logic is part of these passes
>   and not aware of the underlying representation. We can consequently
>   use the same passes to optimize code that was lowered to use different
>   parallel runtime libraries (GOMP, KMP, Cilk runtime, TBB, ...) or into
>   a native parallel IR (of any shape). This is especially useful as the
>   native parallel IR might not always be usable. If that happens we have
>   to fallback to early outlining, thus runtime library calls emitted by
>   the front-end. Even if we at some point have a native parallel
>   representation that is always used, we can simply remove the
>   abstraction introduced by this approach but keep the
>   analysis/optimizations around.
>
> [0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
> [1] https://reviews.llvm.org/D47300
>
>> If this is the case, and given that you explicitly state that this is
>> not a Parallel IR of any sort, is your suggestion to improve
>> optimisation of OpenMP code, based on a "side-car"/ancillary
>> representation built on top of the existing IR, which as I understand
>> should already be able to represent OpenMP? But then this looks a bit
>> redundant to me. So I'm pretty sure one of my assumptions is
>> incorrect. Unless your auxiliar representation is more an alternative
>> to the W-regions [1].
>>
>> Or, maybe I am completely wrong here: you didn't say anything about
>> the FE lowering, which would still happen, and then your proposal
>> builds on top of that. I don't think you meant that, given that your
>> proposal mentions KMP and GOMP (and the current lowering done by clang
>> targets only KMP).
>
> I'm not sure if these paragraphs are still relevant. Does the above
> "explanation" answers you questions already? If not, please continue
> asking!
>
> Cheers,
>   Johannes
>
>> Thank you very much,
>> Roger
>>
>> [1] https://dl.acm.org/citation.cfm?id=3148191
>>
>> 2018-06-07 12:25 GMT+02:00 Johannes Doerfert via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> > This is an RFC to add analyses and transformation passes into LLVM to
>> > optimize programs based on an abstract notion of a parallel region.
>> >
>> >   == this is _not_ a proposal to add a new encoding of parallelism ==
>> >
>> > We currently perform poorly when it comes to optimizations for parallel
>> > codes. In fact, parallelizing your loops might actually prevent various
>> > optimizations that would have been applied otherwise. One solution to
>> > this problem is to teach the compiler about the semantics of the used
>> > parallel representation. While this sounds tedious at first, it turns
>> > out that we can perform key optimizations with reasonable implementation
>> > effort (and thereby also reasonable maintenance costs). However, we have
>> > various parallel representations that are already in use (KMPC,
>> > GOMP, CILK runtime, ...) or proposed (Tapir, IntelPIR, ...).
>> >
>> > Our proposal seeks to introduce parallelism specific optimizations for
>> > multiple representations while minimizing the implementation overhead.
>> > This is done through an abstract notion of a parallel region which hides
>> > the actual representation from the analysis and optimization passes. In
>> > the schemata below, our current five optimizations (described in detail
>> > here [0]) are shown on the left, the abstract parallel IR interface is
>> > is in the middle, and the representation specific implementations is on
>> > the right.
>> >
>> >          Optimization          (A)nalysis/(T)ransformation         Impl.
>> >    ---------------------------------------------------------------------------
>> >      CodePlacementOpt \  /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A)
>> >        RegionExpander -\ |                                     |   GOMPImpl (A)
>> >    AttributeAnnotator -|-|---> ParallelCommunicationInfo (A) --/   ...
>> >    BarrierElimination -/ |
>> > VariablePrivatization /  \---> ParallelIR/Builder (T) -----------> KMPCImpl (T)
>> >
>> >
>> > In our setting, a parallel region can be an outlined function called
>> > through a runtime library but also a fork-join/attach-reattach region
>> > embedded in an otherwise sequential code. The new optimizations will
>> > provide parallelism specific optimizations to all of them (if
>> > applicable). There are various reasons why we believe this is a
>> > worthwhile effort that belongs into the LLVM codebase, including:
>> >
>> >   1) We improve the performance of parallel programs, today.
>> >   2) It serves as a meaningful baseline for future discussions on
>> >      (optimized) parallel representations.
>> >   3) It allows to determine the pros and cons of the different schemes
>> >      when it comes to actual optimizations and inputs.
>> >   4) It helps to identify problems that might arise once we start to
>> >      transform parallel programs but _before_ we commit to a specific
>> >      representation.
>> >
>> > Our prototypes for the OpenMP KMPC library (used by clang) already shows
>> > significant speedups for various benchmarks [0]. It also exposed a (to
>> > me) prior unknown problem between restrict/noalias pointers and
>> > (potential) barriers (see Section 3 in [0]).
>> >
>> > We are currently in the process of cleaning the code, extending the
>> > support for OpenMP constructs and adding a second implementation for a
>> > embedded parallel regions. Though, a first horizontal prototype
>> > implementation is already available for review [1].
>> >
>> > Inputs of any kind are welcome and reviewers are needed!
>> >
>> > Cheers,
>> >   Johannes
>> >
>> >
>> > [0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
>> > [1] https://reviews.llvm.org/D47300
>> >
>> >
>> > P.S.
>> >   Sorry if you received this message multiple times!
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>>
>>
>>
>> --
>> Roger Ferrer Ibáñez
>
> --
>
> Johannes Doerfert
> PhD Student / Researcher
>
> Compiler Design Lab (Professor Hack) / Argonne National Laboratory
> Saarland Informatics Campus, Germany / Lemont, IL 60439, USA
> Building E1.3, Room 4.31
>
> Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de / jdoerfert at anl.gov
> Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert

-- 
Roger Ferrer Ibáñez