[cfe-dev] [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Hal Finkel hfinkel at anl.gov
Tue Oct 2 21:11:10 PDT 2012


On Tue, 02 Oct 2012 19:52:37 -0700
Mehdi AMINI <mehdi.amini at silkan.com> wrote:

> Hi,
> 
> Le 02/10/2012 19:29, Hal Finkel a écrit :
> > On Mon, 01 Oct 2012 22:56:50 -0700
> > Chris Lattner <clattner at apple.com> wrote:
> >> On Oct 1, 2012, at 10:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>> On Mon, 01 Oct 2012 21:26:54 -0700
> >>> Chris Lattner <clattner at apple.com> wrote:
> >>>> On Oct 1, 2012, at 6:16 PM, greened at obbligato.org wrote:
> >>>>> Sanjoy Das <sanjoy at playingwithpointers.com> writes:
> >>>>>
> >>>>>> In short, I propose a intrinsic based approach which hinges on
> >>>>>> the concept of a "parallel map".  The immediate effect of using
> >>>>>> intrinsics is that we no longer have to worry about missing
> >>>>>> metadata.  Moreover, we are still free to lower the intrinsics
> >>>>>> in a variety of ways -- including vectorizing them or lowering
> >>>>>> them to calls to an actual openmp backend.
> >>>>>
> >>>>> I'll re-ask here since this is in its own thread.
> >>>>>
> >>>>> Why can't we just make ordinary function calls to runtime
> >>>>> routines?
> >>>>
> >>>> I agree.  I can't imagine any practical way that a metadata-based
> >>>> approach could be preserved by optimizers.
> >>>
> >>> Regarding the metadata approach, it depends on what you mean by
> >>> preserved. The trick is to make sure that transformations that
> >>> don't understand the metadata can't cause miscompiles. The
> >>> specific scheme that I proposed used a combination of
> >>> procedurization and cross-referencing metadata such that
> >>> invalidated parallel metadata can be detected and the entire
> >>> enclosing parallel region can be dropped.
> >>>
> >>> The proposal from Intel, which more-heavily uses intrinsics, has
> >>> other advantages, but will require more modifications to existing
> >>> passes to realize its potential optimization benefits.
> >>
> >> My comment was mostly in response to the Intel proposal, which
> >> effectively translates OpenMP pragmas directly into llvm
> >> intrinsics + metadata.  I can't imagine a way to make this work
> >> *correctly* without massive changes to the optimizer.
> >
> > Also, I should mention that Sanjoy's recommendation, which is to
> > move the parallelization state into an analysis pass, might make
> > sense here. If not all intermediate passes preserve the analysis,
> > then the state will be lost, and no parallelization will occur. In
> > the context of OpenMP, where parallelization is essentially
> > optional, I think this should be fine.
> 
> 
> What do you mean by "parallelization is essentially optional"?
> It was already answered today (on llvmdev@) by David that:
> 
> "Actually, it is perfectly possible to have a program with OpenMP
> directives that is NOT valid when those directives are ignored.  In
> other words, it's possible to write a legal OMP program that relies on
> parallelism to function correctly.".
> 
> Just think about a task-based producer/consumer code for example.

Thank you for pointing this out again. I believe that the real issue is
whether or not OpenMP allows the user to force a parallel region to
execute with a specified number of threads. In my experience the answer
has been no, even the num_threads clause is just a suggestion to the
runtime. Looking at the specification, however, it seems that this
depends on the state of the 'dynamic adjustment' runtime variable; if
it is set to false, then the num_threads clause must be followed (so
long as some set of other conditions, all which can be probed using the
runtime library, are also true).

Nevertheless, the num_threads clause takes a dynamic input, and so must
be encoded using an intrinsic call (this is true in all current
proposals). The fact that this intrinsic cannot be safely dropped adds
an error condition for which we must check. Enforcing that a
compilation error must occur if we would otherwise drop the parallel
region associated with a num_threads clause seems practical (and, like
other ICEs, would indicate something that we need to fix).

 -Hal

> 



-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory




More information about the cfe-dev mailing list