[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Tue Oct 2 19:29:46 PDT 2012

On Mon, 01 Oct 2012 22:56:50 -0700
Chris Lattner <clattner at apple.com> wrote:

> 
> On Oct 1, 2012, at 10:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > On Mon, 01 Oct 2012 21:26:54 -0700
> > Chris Lattner <clattner at apple.com> wrote:
> > 
> >> 
> >> On Oct 1, 2012, at 6:16 PM, greened at obbligato.org wrote:
> >> 
> >>> Sanjoy Das <sanjoy at playingwithpointers.com> writes:
> >>> 
> >>>> In short, I propose a intrinsic based approach which hinges on
> >>>> the concept of a "parallel map".  The immediate effect of using
> >>>> intrinsics is that we no longer have to worry about missing
> >>>> metadata.  Moreover, we are still free to lower the intrinsics in
> >>>> a variety of ways -- including vectorizing them or lowering them
> >>>> to calls to an actual openmp backend.
> >>> 
> >>> I'll re-ask here since this is in its own thread.
> >>> 
> >>> Why can't we just make ordinary function calls to runtime
> >>> routines?
> >> 
> >> I agree.  I can't imagine any practical way that a metadata-based
> >> approach could be preserved by optimizers.
> > 
> > Regarding the metadata approach, it depends on what you mean by
> > preserved. The trick is to make sure that transformations that don't
> > understand the metadata can't cause miscompiles. The specific scheme
> > that I proposed used a combination of procedurization and
> > cross-referencing metadata such that invalidated parallel metadata
> > can be detected and the entire enclosing parallel region can be
> > dropped.
> > 
> > The proposal from Intel, which more-heavily uses intrinsics, has
> > other advantages, but will require more modifications to existing
> > passes to realize its potential optimization benefits.
> 
> My comment was mostly in response to the Intel proposal, which
> effectively translates OpenMP pragmas directly into llvm intrinsics +
> metadata.  I can't imagine a way to make this work *correctly*
> without massive changes to the optimizer.

Also, I should mention that Sanjoy's recommendation, which is to move
the parallelization state into an analysis pass, might make sense here.
If not all intermediate passes preserve the analysis, then the state
will be lost, and no parallelization will occur. In the context of
OpenMP, where parallelization is essentially optional, I think this
should be fine.

In any case, if we mark the intrinsics has having unknown side effects
then they'll serve as barriers for code motion. I *think* that this
would also inhibit loop restructuring (or could be made to do so) so
loop annotations could be kept properly associated with the intended
code, but this would need to be checked.

 -Hal

> 
> -Chris

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory