[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Tue Oct 2 22:26:11 PDT 2012

On Oct 2, 2012, at 3:09 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote:
> Chris,
> 
>> My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata.  I can't imagine a way to make this work *correctly* without massive changes to the optimizer.
> 
> There are three ways to make this work correctly:
> 
> 1) Ignore OpenMP-related intrinsics and associated metadata.  Least
> effort, least benefit (no OpenMP support). 

This is trivially true, but the entire point of supporting OpenMP in the IR would be to have some sort of late "procedurization" pass that actually exposes the parallelism through some runtime.  Saying that we could just ignore this is silly: if we wanted to ignore OpenMP, we can do that in the frontend with far less complexity.  In fact, we're already done! ;-)

> 2) Make procedurization (including all runtime calls -- no intrinsics
> left after this step) at the very start of LLVM optimizer. No changes
> to optimizations, but no opportunity to optimize parallel code. As
> cheap and easy as one can do to support OpenMP. This might be a good
> choice for initial implementation.
> 
> 3) Do some carefully chosen optimizations before procedurization. Do
> heavylifting (like loop restructuring optimizations) after
> procedurization. Some effort, a lot of benefit. This is essentially
> what is described in [Tian05] (referenced in our proposal).

I think you're missing the point here.  The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer.  Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms.  This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering.

My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches:

1. Early procedurization (e.g. in the frontend that produces LLVM IR).  This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early.

2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4).  While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it.

> 4) Make all optimizations thread-aware. Best approach in theory, no
> compilers exist that go as far.

It's not clear to me exactly what sorts of optimizations that late procedurization is attempting to allow.  I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model.  However, the technical benefits of this design are not clear to me, and I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature.  This is exactly the sort of thing that I want to avoid in LLVM.

-Chris