[LLVMdev] LLVM Parallel IR

Tue Mar 10 05:45:46 PDT 2015

----- Original Message -----
> From: "Kevin Streit" <streit at mailbox.org>
> To: "Renato Golin" <renato.golin at linaro.org>, "Tobias Grosser" <tgrosser at inf.ethz.ch>
> Cc: "William Moses" <wmoses at csail.mit.edu>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, March 10, 2015 3:36:10 AM
> Subject: Re: [LLVMdev] LLVM Parallel IR
> 
> > On March 9, 2015 at 6:52 PM Renato Golin <renato.golin at linaro.org>
> > wrote:
> > 
> > 
> > On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch>
> > wrote:
> > > If my memories are right, one of the critical issues (besides
> > > other engineering considerations) was that parallelism metadata
> > > in LLVM is
> > > optional and can always be dropped. However, for
> > > OpenMP it sometimes is incorrect to execute a loop sequential
> > > that has been
> > > marked parallel in the source code.
> > 
> > Exactly. The fact that metadata goes stale quickly is not a flaw,
> > but
> > a design decision. If the producing pass is close enough from the
> > consuming one, you should be ok. If not, then proving legality
> > might
> > be tricky and time consuming. The problem with OpenMP and other
> > vectorizer pragmas is that the metadata is introduced by the
> > front-end, and it's a long way down the pipeline for it to be
> > consumed. Having said that, it works ok if you keep your code
> > simple.
> 
> I know that this was a long discussion and that the "breakability" of
> parallel
> loop infos is the result of a design decision. And I also believe
> that this is
> a good way as long as parallelism is not part of the contract with
> the user
> (i.e., the programmer when placing explicit parallelism annotations,
> or the
> language designer introducing parallelism into the semantics of the
> language).
> Tobias already mentioned problems with breaking OpenMP semantics.
> Similarly
> different forms of parallelism, like task parallelism, could
> certainly be
> represented using metadata, say by extracting the task code to a
> function and
> annotating the call as being spawned for parallel execution. Again,
> optimizations could break it, violating a possible contract with the
> user.
> 
> Alternatively we could introduce intrinsics, which I currently do.
> This would
> forbid certain optimizations like moving potential memory accesses in
> and out
> of "parallel code sections" and therefore does not break parallelism
> that
> often.  The headaches that this approach causes me are that basic
> analyses like
> dominance, reachability and the like are broken in that setting as
> everything
> computed in one parallel task, followed by another parallel task in
> the cfg
> does not dominate or even reach the second task. This of course
> influences the
> precision and correctness of optimizations, like for instance
> redundant code
> elimination or GVN.
> 
> > I'd be interested in knowing what in the IR cannot be accomplished
> > in
> > terms of metadata for parallelization, and what would be the new
> > constructs that needed to be added to the IR in order to do that.
> > If
> > there is benefit for your project at the same time as for OpenMP
> > and
> > our internal vectorizer annotation, changing the IR wouldn't be
> > impossible. We have done the same for exception handling...
> 
> I understand that parallelism is a very invasive concept and
> introducing it
> into a so far "sequential" IR will cause severe breakage and
> headaches. But I
> am afraid that if we accept parallelism as being a first class
> citizen, then I
> would prefer doing it as a core part of the IR.  One possibility to
> do this
> gradually might also be to have a seperate, parallel, IR, say PIR,
> that will be
> lowered to regular IR at some point (however this point is chosen).
> Existing
> optimizations can then be gradually moved from the regular IR phase
> to the PIR
> phase where appropriate and useful.  Nevertheless I do not propose to
> do such a
> thing in LLVM right now. I think this might be an option for a
> (bigger)
> research project at first.
>  
> I'd be happy to hear further thoughts about that.

Part of the issue here is that the benefit to IR features vs. enhancing LLVM (TLI, BasicAA, etc.) to understand more about the semantics of the runtime calls is not clear. To play devil's advocate;

 - Currently, runtime calls interfere with optimizations such as LICM because of the conservative answer BasicAA must provide about unknown external functions, however, BasicAA has knowledge of certain external function calls, and some knowledge of these could be added.

 - We'd like to perform some optimizations, such as duplicate barrier removal. However, an optimization can recognize consecutive calls to a barrier runtime library function pretty easily -- we don't need special IR features for that, perhaps only a good abstraction layer if we support different runtimes.

 -Hal

> 
> Cheers,
> 
> ---
> 
> Kevin Streit
> Neugäßchen 2
> 66111 Saarbrücken
> 
> Tel. +49 (0)151 23003245
> streit at mailbox.org · http://www.kevinstreit.de
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory