[llvm-dev] [RFC] Abstract Parallel IR Optimizations
Roger Ferrer Ibáñez via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 12 07:56:25 PDT 2018
thanks a lot, all clear now!
2018-06-12 16:23 GMT+02:00 Johannes Doerfert <jdoerfert at anl.gov>:
> Hi Roger,
> On 06/12, Roger Ferrer Ibáñez wrote:
>> apologies in advance if the questions following are silly or don't
>> make sense. I lack a bit of context here and I'm not sure to fully
>> understand your proposal.
> No worries, I'm glad if people ask questions!
>> Currently clang (and flang) are lowering OpenMP when building LLVM IR
>> (this is because LLVM IR can't express the parallel/concurrent
>> concepts of OpenMP so they have to be lowered first). So, can I assume
>> that your proposal starts off in a context where that lowering is not
>> happening anymore in the front end but it'd happen later in a LLVM IR
>> pass? If so, then you'd be assuming that there is already a way of
>> representing OpenMP constructs in the LLVM IR, is my understanding
>> correct here? I think that the Intel proposal  could be one way
>> (not necessarily the one) to do this (disregarding the fact that it is
>> tailored for OpenMP), does this still make sense?
> My proposal does _not_ assume we change clang in any way, though it does
> also not require it. However, the initial patch  will only work with
> the OpenMP lowering used by clang right now.
> The idea is as follows:
> We have different representation of parallelism in the IR, for example
> the KMP runtime library calls emitted by clang or the Intel parallel
> IR you mentioned. For each of them we write a piece of code that (1)
> extracts domain specific information and (2) allows to modify the
> parallel representation. This is the only piece of code that has to be
> adapted for each parallel representation we want to optimize. On top
> of this are abstract interfaces that expose the information and
> modification options to parallel optimization passes. The patch 
> only contains the attribute annotator but we have more as explained in
> the paper . The analysis/optimization logic is part of these passes
> and not aware of the underlying representation. We can consequently
> use the same passes to optimize code that was lowered to use different
> parallel runtime libraries (GOMP, KMP, Cilk runtime, TBB, ...) or into
> a native parallel IR (of any shape). This is especially useful as the
> native parallel IR might not always be usable. If that happens we have
> to fallback to early outlining, thus runtime library calls emitted by
> the front-end. Even if we at some point have a native parallel
> representation that is always used, we can simply remove the
> abstraction introduced by this approach but keep the
> analysis/optimizations around.
>  http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
>  https://reviews.llvm.org/D47300
>> If this is the case, and given that you explicitly state that this is
>> not a Parallel IR of any sort, is your suggestion to improve
>> optimisation of OpenMP code, based on a "side-car"/ancillary
>> representation built on top of the existing IR, which as I understand
>> should already be able to represent OpenMP? But then this looks a bit
>> redundant to me. So I'm pretty sure one of my assumptions is
>> incorrect. Unless your auxiliar representation is more an alternative
>> to the W-regions .
>> Or, maybe I am completely wrong here: you didn't say anything about
>> the FE lowering, which would still happen, and then your proposal
>> builds on top of that. I don't think you meant that, given that your
>> proposal mentions KMP and GOMP (and the current lowering done by clang
>> targets only KMP).
> I'm not sure if these paragraphs are still relevant. Does the above
> "explanation" answers you questions already? If not, please continue
>> Thank you very much,
>>  https://dl.acm.org/citation.cfm?id=3148191
>> 2018-06-07 12:25 GMT+02:00 Johannes Doerfert via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> > This is an RFC to add analyses and transformation passes into LLVM to
>> > optimize programs based on an abstract notion of a parallel region.
>> > == this is _not_ a proposal to add a new encoding of parallelism ==
>> > We currently perform poorly when it comes to optimizations for parallel
>> > codes. In fact, parallelizing your loops might actually prevent various
>> > optimizations that would have been applied otherwise. One solution to
>> > this problem is to teach the compiler about the semantics of the used
>> > parallel representation. While this sounds tedious at first, it turns
>> > out that we can perform key optimizations with reasonable implementation
>> > effort (and thereby also reasonable maintenance costs). However, we have
>> > various parallel representations that are already in use (KMPC,
>> > GOMP, CILK runtime, ...) or proposed (Tapir, IntelPIR, ...).
>> > Our proposal seeks to introduce parallelism specific optimizations for
>> > multiple representations while minimizing the implementation overhead.
>> > This is done through an abstract notion of a parallel region which hides
>> > the actual representation from the analysis and optimization passes. In
>> > the schemata below, our current five optimizations (described in detail
>> > here ) are shown on the left, the abstract parallel IR interface is
>> > is in the middle, and the representation specific implementations is on
>> > the right.
>> > Optimization (A)nalysis/(T)ransformation Impl.
>> > ---------------------------------------------------------------------------
>> > CodePlacementOpt \ /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A)
>> > RegionExpander -\ | | GOMPImpl (A)
>> > AttributeAnnotator -|-|---> ParallelCommunicationInfo (A) --/ ...
>> > BarrierElimination -/ |
>> > VariablePrivatization / \---> ParallelIR/Builder (T) -----------> KMPCImpl (T)
>> > In our setting, a parallel region can be an outlined function called
>> > through a runtime library but also a fork-join/attach-reattach region
>> > embedded in an otherwise sequential code. The new optimizations will
>> > provide parallelism specific optimizations to all of them (if
>> > applicable). There are various reasons why we believe this is a
>> > worthwhile effort that belongs into the LLVM codebase, including:
>> > 1) We improve the performance of parallel programs, today.
>> > 2) It serves as a meaningful baseline for future discussions on
>> > (optimized) parallel representations.
>> > 3) It allows to determine the pros and cons of the different schemes
>> > when it comes to actual optimizations and inputs.
>> > 4) It helps to identify problems that might arise once we start to
>> > transform parallel programs but _before_ we commit to a specific
>> > representation.
>> > Our prototypes for the OpenMP KMPC library (used by clang) already shows
>> > significant speedups for various benchmarks . It also exposed a (to
>> > me) prior unknown problem between restrict/noalias pointers and
>> > (potential) barriers (see Section 3 in ).
>> > We are currently in the process of cleaning the code, extending the
>> > support for OpenMP constructs and adding a second implementation for a
>> > embedded parallel regions. Though, a first horizontal prototype
>> > implementation is already available for review .
>> > Inputs of any kind are welcome and reviewers are needed!
>> > Cheers,
>> > Johannes
>> >  http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
>> >  https://reviews.llvm.org/D47300
>> > P.S.
>> > Sorry if you received this message multiple times!
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> Roger Ferrer Ibáñez
> Johannes Doerfert
> PhD Student / Researcher
> Compiler Design Lab (Professor Hack) / Argonne National Laboratory
> Saarland Informatics Campus, Germany / Lemont, IL 60439, USA
> Building E1.3, Room 4.31
> Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de / jdoerfert at anl.gov
> Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert
Roger Ferrer Ibáñez
More information about the llvm-dev