[LLVMdev] [RFC] OpenMP Representation in LLVM IR

Sat Sep 29 11:21:40 PDT 2012

On Sat, 29 Sep 2012 21:16:21 +0400
Andrey Bokhanko <andreybokhanko at gmail.com> wrote:

> Hal,
> 
> Thank you for the reply!
> 
> > As you may know, this is the third such proposal over the past two
> > months, one by me
> > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html)
> > and the other, based somewhat on mine, by Sanjoy
> > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053798.html)
> 
> Yes, I was aware of your proposal. I hesitated to make any comments or
> criticism -- as I am, obviously, biased.
> 
> In my opinion, two most important differences between our proposals
> are:
> 
> 1) Your design employs explicit procedurization done in front-end,
> while our design allows both early (right after front-end) and late
> (later in the back-end) procedurization.

Yes.

> 2) You aim to provide a general support for all (or at least most)
> parallel standards, while our aim is more modest -- just OpenMP.

To be fair, my proposal was also fairly OpenMP specific.

> 
> Please see discussion of 1) in "Function Outlining" section of our
> proposal.
> 
> As for 2), there are many arguments one might use in favor of more
> general or more specialized solution. What is easier to implement?

I feel that my proposal is easier to implement because it is safer:
because of the procedurization and the cross-referencing of the
metadata, passes that don't know about the parallelization metadata and
drop it will cause parallel regions to be lost, but should not
otherwise lead to miscompiled code, and with inlining, most
optimization opportunities are preserved.

I agree that your proposal allows more optimization opportunities to be
preserved. On the other hand, it will require more auditing of existing
code, and new infrastructure just to make the new intrinsics don't
interfere with existing optimizations. I trust that you have sufficient
resources to do these things, and that being the case, I don't object.

> What is better for LLVM IR development? Are we sure what we see as
> necessary and sufficient today would be suitable for future parallel
> standards 

I guarantee that the answer is no ;) -- but there are a number of
current standards that can be considered.

>-- given all the developments happening in this area as we
> speak? Whatever one answers, it would be quite subjective. My personal
> preference is for simplest and most focused solution -- but then again
> this is subjective.
> 
> > In order for your proposal to work well, there will be a lot of
> > infrastructure work required (more than with my proposal); many
> > passes will need to be made explicitly aware of how they can, or
> > can't, reorder things with respect to the parallelization
> > intrinsics; loop restructuring may require special care, etc. How
> > this is done depends in part on where the state information is
> > stored: Do we keep the parallelization information in the
> > intrinsics during mid-level optimization, or do we move its state
> > into an analysis pass? In any case, I don't object to this approach
> > so long as we have a good plan for how this work will be done.
> 
> No -- only passes than happen before procedurization should be aware
> of these intrinsics.

This answer is fairly ambiguous because you haven't explained exactly
when this will happen. I assume that it will happen fairly late. For
some things, like atomics lowering, we may want to wait until just
prior to code generation to allow late customization by target-specific
code.

> 
> I agree that it is not so easy to make optimizations "thread-aware".
> But the problem essentially the same, no matter how parallel extension
> is manifested in the IR.
> 
> > When we discussed this earlier this year, there seemed to be some
> > consensus that we wanted to avoid, to the extent possible,
> > introducing OpenMP-specific intrinsics into the LLVM layer. Rather,
> > we should define some kind of parallelization API (in the form of
> > metadata, intrinsics, etc.) onto which OpenMP can naturally map
> > along with other paradigms. There is interest in supporting
> > OpenACC, for example, which will require data copying clauses, and
> > it would make sense to share as much of the infrastructure as
> > possible with OpenMP. Are you interested in providing Cilk support
> > as well? We probably don't want to have NxM slightly-different ways
> > of expressing 'this is a parallel region'. There are obviously
> > cases in which things need to be specific to the interface (like
> > runtime loop scheduling in OpenMP which implies a specific
> > interaction with the runtime library), but such cases may be the
> > exception rather than the rule.
> >
> > We don't need 'omp' in the intrinsic names and also 'OMP_' on all of
> > the string specifiers. Maybe, to my previous point, we could call
> > the intrinsics 'parallel' and use 'OMP_' only when something is
> > really OpenMP-specific?
> 
> As I said before, our aim was quite simple -- OpenMP support only.

Fair enough, but that does not explain why, even with a restricted
scope, we need to repeat 'omp' in both the intrinsic name and its
associated metadata.

As far as I can tell, what you've proposed is a fairly generic way to
pass pragma-type information from the frontend to the backend. Going
through all of the effort to implement that only to arbitrarily
restrict it to OpenMP pragmas seems silly. Having this capability would
be great, and we could use it for other things. For example, I'd like
to have a '#pragma unroll(n)' for loops. If we have a generic way to
pass such contextual pragmas to the backend, it would make supporting
such extensions much easier.

> 
> Can the design be extended to allow more general form of parallel
> extensions support? Probably... but this is definitely more than what
> we intended.
> 
> > You don't seem to want to map thread-private variables onto the
> > existing TLS support. Why?
> 
> Because we don't employ explicit procedurization. What happens after
> procedurization (including how thread-private variables are manifested
> in the IR) is heavily dependent on OpenMP runtime library one relies
> upon and out of scope of our proposal.

I thought that thread-private variables in OpenMP could be declared
only a global scope. This makes them map cleanly to existing TLS
support, and I don't see how the intrinsics will work in this case
(because you can't call intrinsics at global scope). That having been
said, I recommend that we introduce a new 'omp' TLS mode so that the
implementation is free to choose the most-appropriate lowering.

Thanks again,
Hal

> 
> Yours,
> Andrey Bokhanko
> ---
> Software Engineer
> Intel Compiler Team
> Intel Corp.

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory