[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Tue Oct 2 13:00:27 PDT 2012

On Tue, 2 Oct 2012 14:28:25 +0000
"Adve, Vikram Sadanand" <vadve at illinois.edu> wrote:

> Hal, Andrey, Alexey,
> 
> From the LLVM design viewpoint, there is a fundamental problem with
> both Hal's approach and the Intel approach: both are quite
> language-specific.  OpenMP is a particular parallel language, with
> particular constructs (e.g., parallel regions) and semantics.  LLVM
> is a language-neutral IR and infrastructure and OpenMP-specific
> concepts should not creep into it. 

This is a matter of perspective. One could also argue that the LLVM IR
should be target neutral. Nevertheless, we have target-specific
intrinsics. Similarly, there is a legitimate case to be made for
producing code that targets existing OpenMP runtime ABIs. The most
natural way to do this is for some of the ABIs semantics, and thus some
of OpenMP's language semantics, to leak into the IR level. Otherwise,
one ends up playing too much of a double-translation game.

Consider, for example, an OpenMP loop with runtime scheduling. This
implies a certain interaction with the OpenMP runtime library (and so is
explicitly OpenMP-specific). Nevertheless, we don't want to lower the
parallelization too early because we'd like to perform look analysis
and transformations (like LICM) first. The only way to do this properly
seems to be to push some of the OpenMP-specific nature of the loop into
the IR. This is not necessarily bad.

> I've included an excerpt from
> Hal's proposal below, which shows what I mean: the  design is couched
> in terms of OpenMP parallel regions.  Other parallel languages, e.g,
> Cilk, have no such notion.

The approach that I proposed was certainly inspired by OpenMP, and
designed to fully support OpenMP, but was not limited to it. As a
practical matter, OpenMP includes both loop-based parallelism and
task-based parallelism, which is a pretty broad foundation for
supporting parallelism in general.

I looked at the Cilk documentation when writing my proposal. Is there a
reason why Cilk's semantics cannot be mapped onto the proposed support
for parallel tasks?

>  The latest Intel proposal is at least as
> OpenMP-specific.
> 
> I do agree with the general goal of trying to support parallel
> programming languages in a more first-class manner in LLVM than we do
> today.  But the right approach for that is to be as language-neutral
> as possible.  For example, any parallelism primitives in the LLVM IR
> and any LLVM- or machine-level optimizations that apply to parallel
> code should be applicable not only to OpenMP but also to languages
> like Cilk, Java/C#, and several others.  I think "libraries" like
> Intel's TBB should be supported, too: they have a (reasonably)
> well-defined semantics, just like languages do, and are become widely
> used.  
> 
> I also do not think LLVM metadata is the way to represent the
> primitives, because they are simply too fragile.   But you don't have
> to argue that one with me :-), others have argued this already.

I've never argued that the mechanism is not fragile. However, I do
think that, with a proper design, it is possible to use the existing
metadata infrastructure (with some minimal changes, for example, to
inhibit inlining). I am not committed to a metadata-based approach, but
I think such an approach is workable.

>  You
> really need more first class, language-neutral, LLVM mechanisms for
> parallelism.  I'm not pretending I know how to do this, though there
> are papers on the subject, including one from an Intel team (Pillar:
> A Parallel Implementation Language, LCPC 2007).

I'll look at the paper, thanks for the reference! The problem is not
just in supporting parallelism in general, the problem is specifically
in supporting OpenMP, with its mix of language semantics, runtime
semantics, and the interaction of the two, while not inhibiting
optimization.

Thanks again,
Hal

> 
> --Vikram
> Professor, Computer Science
> University of Illinois at Urbana-Champaign
> http://llvm.org/~vadve
> 
> 
> 
> 
> > To mark this function as a parallel region, a module-level
> > 'parallel' metadata entry is created. The call site(s) of this
> > function are marked with this metadata,. The metadata has entries:
> >  - The string "region"
> >  - A reference to the parallel-region function
> >  - If applicable, a list of metadata references specifying
> > special-handling child regions (parallel loops and
> > serialized/critical regions)
> > 
> > If the special-handling region metadata is no longer referenced by
> > code within the parallel region, then the region has become
> > invalid, and will be removed (meaning all parallelization metadata
> > will be removed) by the ParallelizationCleanup. The same is true
> > for all other cross-referenced metadata below.
> > 
> > Note that parallel regions can be nested.
> > 
> > As a quick example, something like:
> > int main() {
> >   int a;
> > #pragma omp parallel firstprivate(a) 
> >   do_something(a)
> >   ...
> > }
> > 
> > becomes something like:
> > 
> > define private void @parreg(i32 %a) {
> > entry:
> >   call void @do_something(i32 %a)
> >   ret
> > }
> > 
> > define i32 @main() {
> > entry:
> > ...
> > call void @parreg1(i32 %a) !parallel !0
> > ...
> > 
> > !0 = metadata !{ metadata !"region", @parreg }
> > 
> 
> 
> --Vikram
> Professor, Computer Science
> University of Illinois at Urbana-Champaign
> http://llvm.org/~vadve
> 
> 
> 
> 
> 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory