I am not an optimizer guy, but, I am just thinking, if we can solve the problems that we are discussing in this mail chain, by introducing a middle-end in between front-end and LLVM. We may need to introduce GGC GIMPLE kind of IR (or any new suitable IR) in the middle-end so that front-end can produce this new IR, middle-end can consume it, and do all the parallelization and subsequent optimizations, and generate LLVM IR to take it forward by LLVM. Again going through middle-end can be made optional depending on the requirements. This means, front-end must have to have a capability to produce either new IR to middle-end or LLVM IR directly to LLVM.<div>

<br></div><div>--</div><div>mahesha</div><div><br><br><div class="gmail_quote">On Wed, Oct 3, 2012 at 1:30 AM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, 2 Oct 2012 14:28:25 +0000<br>

<div class="im">"Adve, Vikram Sadanand" <<a href="mailto:vadve@illinois.edu">vadve@illinois.edu</a>> wrote:<br>

<br>

</div><div class="im">> Hal, Andrey, Alexey,<br>

><br>

> From the LLVM design viewpoint, there is a fundamental problem with<br>

> both Hal's approach and the Intel approach: both are quite<br>

> language-specific.  OpenMP is a particular parallel language, with<br>

> particular constructs (e.g., parallel regions) and semantics.  LLVM<br>

> is a language-neutral IR and infrastructure and OpenMP-specific<br>

> concepts should not creep into it.<br>

<br>

</div>This is a matter of perspective. One could also argue that the LLVM IR<br>

should be target neutral. Nevertheless, we have target-specific<br>

intrinsics. Similarly, there is a legitimate case to be made for<br>

producing code that targets existing OpenMP runtime ABIs. The most<br>

natural way to do this is for some of the ABIs semantics, and thus some<br>

of OpenMP's language semantics, to leak into the IR level. Otherwise,<br>

one ends up playing too much of a double-translation game.<br>

<br>

Consider, for example, an OpenMP loop with runtime scheduling. This<br>

implies a certain interaction with the OpenMP runtime library (and so is<br>

explicitly OpenMP-specific). Nevertheless, we don't want to lower the<br>

parallelization too early because we'd like to perform look analysis<br>

and transformations (like LICM) first. The only way to do this properly<br>

seems to be to push some of the OpenMP-specific nature of the loop into<br>

the IR. This is not necessarily bad.<br>

<div class="im"><br>

> I've included an excerpt from<br>

> Hal's proposal below, which shows what I mean: the  design is couched<br>

> in terms of OpenMP parallel regions.  Other parallel languages, e.g,<br>

> Cilk, have no such notion.<br>

<br>

</div>The approach that I proposed was certainly inspired by OpenMP, and<br>

designed to fully support OpenMP, but was not limited to it. As a<br>

practical matter, OpenMP includes both loop-based parallelism and<br>

task-based parallelism, which is a pretty broad foundation for<br>

supporting parallelism in general.<br>

<br>

I looked at the Cilk documentation when writing my proposal. Is there a<br>

reason why Cilk's semantics cannot be mapped onto the proposed support<br>

for parallel tasks?<br>

<div class="im"><br>

>  The latest Intel proposal is at least as<br>

> OpenMP-specific.<br>

><br>

> I do agree with the general goal of trying to support parallel<br>

> programming languages in a more first-class manner in LLVM than we do<br>

> today.  But the right approach for that is to be as language-neutral<br>

> as possible.  For example, any parallelism primitives in the LLVM IR<br>

> and any LLVM- or machine-level optimizations that apply to parallel<br>

> code should be applicable not only to OpenMP but also to languages<br>

> like Cilk, Java/C#, and several others.  I think "libraries" like<br>

> Intel's TBB should be supported, too: they have a (reasonably)<br>

> well-defined semantics, just like languages do, and are become widely<br>

> used.<br>

><br>

> I also do not think LLVM metadata is the way to represent the<br>

> primitives, because they are simply too fragile.   But you don't have<br>

> to argue that one with me :-), others have argued this already.<br>

<br>

</div>I've never argued that the mechanism is not fragile. However, I do<br>

think that, with a proper design, it is possible to use the existing<br>

metadata infrastructure (with some minimal changes, for example, to<br>

inhibit inlining). I am not committed to a metadata-based approach, but<br>

I think such an approach is workable.<br>

<div class="im"><br>

>  You<br>

> really need more first class, language-neutral, LLVM mechanisms for<br>

> parallelism.  I'm not pretending I know how to do this, though there<br>

> are papers on the subject, including one from an Intel team (Pillar:<br>

> A Parallel Implementation Language, LCPC 2007).<br>

<br>

</div>I'll look at the paper, thanks for the reference! The problem is not<br>

just in supporting parallelism in general, the problem is specifically<br>

in supporting OpenMP, with its mix of language semantics, runtime<br>

semantics, and the interaction of the two, while not inhibiting<br>

optimization.<br>

<br>

Thanks again,<br>

Hal<br>

<div class="HOEnZb"><div class="h5"><br>

><br>

> --Vikram<br>

> Professor, Computer Science<br>

> University of Illinois at Urbana-Champaign<br>

> <a href="http://llvm.org/~vadve" target="_blank">http://llvm.org/~vadve</a><br>

><br>

><br>

><br>

><br>

> > To mark this function as a parallel region, a module-level<br>

> > 'parallel' metadata entry is created. The call site(s) of this<br>

> > function are marked with this metadata,. The metadata has entries:<br>

> >  - The string "region"<br>

> >  - A reference to the parallel-region function<br>

> >  - If applicable, a list of metadata references specifying<br>

> > special-handling child regions (parallel loops and<br>

> > serialized/critical regions)<br>

> ><br>

> > If the special-handling region metadata is no longer referenced by<br>

> > code within the parallel region, then the region has become<br>

> > invalid, and will be removed (meaning all parallelization metadata<br>

> > will be removed) by the ParallelizationCleanup. The same is true<br>

> > for all other cross-referenced metadata below.<br>

> ><br>

> > Note that parallel regions can be nested.<br>

> ><br>

> > As a quick example, something like:<br>

> > int main() {<br>

> >   int a;<br>

> > #pragma omp parallel firstprivate(a)<br>

> >   do_something(a)<br>

> >   ...<br>

> > }<br>

> ><br>

> > becomes something like:<br>

> ><br>

> > define private void @parreg(i32 %a) {<br>

> > entry:<br>

> >   call void @do_something(i32 %a)<br>

> >   ret<br>

> > }<br>

> ><br>

> > define i32 @main() {<br>

> > entry:<br>

> > ...<br>

> > call void @parreg1(i32 %a) !parallel !0<br>

> > ...<br>

> ><br>

> > !0 = metadata !{ metadata !"region", @parreg }<br>

> ><br>

><br>

><br>

> --Vikram<br>

> Professor, Computer Science<br>

> University of Illinois at Urbana-Champaign<br>

> <a href="http://llvm.org/~vadve" target="_blank">http://llvm.org/~vadve</a><br>

><br>

><br>

><br>

><br>

><br>

><br>

<br>

<br>

<br>

</div></div><div class="im HOEnZb">--<br>

Hal Finkel<br>

Postdoctoral Appointee<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>mahesha</div><br>

</div>