[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

Wed Aug 15 09:28:35 PDT 2012

On Wed, 15 Aug 2012 10:04:34 +0000
"Raghavendra, Prakash" <Prakash.Raghavendra at amd.com> wrote:

> 
> Hi Hal
> 
> I was also looking at providing such a support in LLVM for capturing
> (both explicit and implicit) parallelism in LLVM. We had an initial
> discussion around this and your proposal comes at the right time. We
> support such an initiative. We can work together to get this support
> implemented in LLVM.

Great!

> 
> But, I have a slight different view. I think today parallelism does
> not necessarily mean OpenMP or SIMD, we are in the area of
> heterogeneous computing. I agree that your primary target was
> thread-based parallelism, but I think we could extend this while we
> capture the parallelism in the program.

I don't think that we have a different view, but my experience with
heterogeneous systems is limited, and while I've played around with
OpenACC and OpenCL some, I don't feel qualified to design an LLVM
support API for those standards. I don't feel that I really understand
the use cases well enough. My hope is that others will chime in with
ideas on how to best support those models.

I think that the largest difference between shared-memory parallelism
(as in OpenMP) and the parallelism targeted by OpenACC, etc. is the
memory model. With OpenACC, IIRC, there is an assumption that the
accelerator memory is separate and specific data-copying directives are
necessary. Furthermore, with asynchronous-completion support, these
data copies are not optional. We could certainly add data-copying
intrinsics for this, but the underlying problem is code assumptions
about the data copies. I'm not sure how to deal with this.

> 
> My idea is to capture parallelism with the way you have said using
> 'metadata'. I agree to record the parallel regions in the metadata
> (as given by the user). However, we could also give placeholders to
> record any additional information that the compiler writer needs like
> number of threads, scheduling parameters, chunk size, etc etc which
> are specific perhaps to OpenMP.

I agree, although I think that some of those parameters are generic
enough to apply to different parallelization mechanism. They might also
be ignored by some mechanisms for which they're irrelevant. We should
make the metadata modular, I think that is a good idea. Instead of
taking a fixed list of things, for example, we may want to encode
name/value pairs.

> 
> The point is that the same parallel loop could be targeted by another
> standard to accelerators today (like GPUs) using another standard
> OpenACC. We may get a new standard to capture and target for
> different kind of parallel device, which could look quite different,
> and has to specifically targeted.

Yes. We just need to make sure that we fully capture the semantics of
the standards that we're targeting. My idea was to start with OpenMP,
and make sure that we could fully capture its semantics, and then move
on from there.

> 
> Since we are at the intermediate layer, we could be independent of
> both user level standards like OpenMP, OpenACC, OpenCL, Cilk+, C++AMP
> etc and at the same time, keep enough information at this stage so
> that the compiler could generate efficient backend code for the
> target device.

Yes, this is, to the extent possible, what I'd like.

> So, my suggestion is to keep all these relevant
> information as 'tags' for metadata and it is up to the backend to use
> or throw the information. As you said, if the backend ignores there
> should not be any harm in correctness of the final code.
> 
> Second point I wanted to make was on the intrinsics. I am not sure
> why we need these intrinsics at the LLVM level. I am not sure why we
> would need conditional constructs for expressing parallelism. These
> could be calls directly to the runtime library at the code generation
> level.

These are necessary because of technical requirements; specifically,
metadata variable references do not count as 'uses', and so were
runtime expressions not referenced by an intrinsic, those variables
would be deleted as dead code. In OpenMP, expressions which reference
local variables can appear in the pragmas (such as those which specify
the number of threads), and we need to make sure those expressions are
not removed prior to lowering. I believe that OpenACC has similar
causes to support.

That having been said, I'm certainly open to more generic intrinsics.

> 
> Again, this is very good initiative and we would like to support such
> a support in LLVM ASAP.

I am very happy to hear you say that.

 -Hal

> 
> Prakash Raghavendra
> AMD, Bangalore
> Email: Prakash.raghavendra at amd.com
> Phone: +91-80-3323 0753
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory