[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Tue Oct 2 14:39:54 PDT 2012

I am not an optimizer guy, but, I am just thinking, if we can solve the
problems that we are discussing in this mail chain, by introducing a
middle-end in between front-end and LLVM. We may need to introduce GGC
GIMPLE kind of IR (or any new suitable IR) in the middle-end so that
front-end can produce this new IR, middle-end can consume it, and do all
the parallelization and subsequent optimizations, and generate LLVM IR to
take it forward by LLVM. Again going through middle-end can be made
optional depending on the requirements. This means, front-end must have to
have a capability to produce either new IR to middle-end or LLVM IR
directly to LLVM.

--
mahesha

On Wed, Oct 3, 2012 at 1:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> On Tue, 2 Oct 2012 14:28:25 +0000
> "Adve, Vikram Sadanand" <vadve at illinois.edu> wrote:
>
> > Hal, Andrey, Alexey,
> >
> > From the LLVM design viewpoint, there is a fundamental problem with
> > both Hal's approach and the Intel approach: both are quite
> > language-specific.  OpenMP is a particular parallel language, with
> > particular constructs (e.g., parallel regions) and semantics.  LLVM
> > is a language-neutral IR and infrastructure and OpenMP-specific
> > concepts should not creep into it.
>
> This is a matter of perspective. One could also argue that the LLVM IR
> should be target neutral. Nevertheless, we have target-specific
> intrinsics. Similarly, there is a legitimate case to be made for
> producing code that targets existing OpenMP runtime ABIs. The most
> natural way to do this is for some of the ABIs semantics, and thus some
> of OpenMP's language semantics, to leak into the IR level. Otherwise,
> one ends up playing too much of a double-translation game.
>
> Consider, for example, an OpenMP loop with runtime scheduling. This
> implies a certain interaction with the OpenMP runtime library (and so is
> explicitly OpenMP-specific). Nevertheless, we don't want to lower the
> parallelization too early because we'd like to perform look analysis
> and transformations (like LICM) first. The only way to do this properly
> seems to be to push some of the OpenMP-specific nature of the loop into
> the IR. This is not necessarily bad.
>
> > I've included an excerpt from
> > Hal's proposal below, which shows what I mean: the  design is couched
> > in terms of OpenMP parallel regions.  Other parallel languages, e.g,
> > Cilk, have no such notion.
>
> The approach that I proposed was certainly inspired by OpenMP, and
> designed to fully support OpenMP, but was not limited to it. As a
> practical matter, OpenMP includes both loop-based parallelism and
> task-based parallelism, which is a pretty broad foundation for
> supporting parallelism in general.
>
> I looked at the Cilk documentation when writing my proposal. Is there a
> reason why Cilk's semantics cannot be mapped onto the proposed support
> for parallel tasks?
>
> >  The latest Intel proposal is at least as
> > OpenMP-specific.
> >
> > I do agree with the general goal of trying to support parallel
> > programming languages in a more first-class manner in LLVM than we do
> > today.  But the right approach for that is to be as language-neutral
> > as possible.  For example, any parallelism primitives in the LLVM IR
> > and any LLVM- or machine-level optimizations that apply to parallel
> > code should be applicable not only to OpenMP but also to languages
> > like Cilk, Java/C#, and several others.  I think "libraries" like
> > Intel's TBB should be supported, too: they have a (reasonably)
> > well-defined semantics, just like languages do, and are become widely
> > used.
> >
> > I also do not think LLVM metadata is the way to represent the
> > primitives, because they are simply too fragile.   But you don't have
> > to argue that one with me :-), others have argued this already.
>
> I've never argued that the mechanism is not fragile. However, I do
> think that, with a proper design, it is possible to use the existing
> metadata infrastructure (with some minimal changes, for example, to
> inhibit inlining). I am not committed to a metadata-based approach, but
> I think such an approach is workable.
>
> >  You
> > really need more first class, language-neutral, LLVM mechanisms for
> > parallelism.  I'm not pretending I know how to do this, though there
> > are papers on the subject, including one from an Intel team (Pillar:
> > A Parallel Implementation Language, LCPC 2007).
>
> I'll look at the paper, thanks for the reference! The problem is not
> just in supporting parallelism in general, the problem is specifically
> in supporting OpenMP, with its mix of language semantics, runtime
> semantics, and the interaction of the two, while not inhibiting
> optimization.
>
> Thanks again,
> Hal
>
> >
> > --Vikram
> > Professor, Computer Science
> > University of Illinois at Urbana-Champaign
> > http://llvm.org/~vadve
> >
> >
> >
> >
> > > To mark this function as a parallel region, a module-level
> > > 'parallel' metadata entry is created. The call site(s) of this
> > > function are marked with this metadata,. The metadata has entries:
> > >  - The string "region"
> > >  - A reference to the parallel-region function
> > >  - If applicable, a list of metadata references specifying
> > > special-handling child regions (parallel loops and
> > > serialized/critical regions)
> > >
> > > If the special-handling region metadata is no longer referenced by
> > > code within the parallel region, then the region has become
> > > invalid, and will be removed (meaning all parallelization metadata
> > > will be removed) by the ParallelizationCleanup. The same is true
> > > for all other cross-referenced metadata below.
> > >
> > > Note that parallel regions can be nested.
> > >
> > > As a quick example, something like:
> > > int main() {
> > >   int a;
> > > #pragma omp parallel firstprivate(a)
> > >   do_something(a)
> > >   ...
> > > }
> > >
> > > becomes something like:
> > >
> > > define private void @parreg(i32 %a) {
> > > entry:
> > >   call void @do_something(i32 %a)
> > >   ret
> > > }
> > >
> > > define i32 @main() {
> > > entry:
> > > ...
> > > call void @parreg1(i32 %a) !parallel !0
> > > ...
> > >
> > > !0 = metadata !{ metadata !"region", @parreg }
> > >
> >
> >
> > --Vikram
> > Professor, Computer Science
> > University of Illinois at Urbana-Champaign
> > http://llvm.org/~vadve
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
mahesha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121003/3fa0625c/attachment.html>