[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)

Wed Oct 3 00:56:34 PDT 2012

Chris,

> I think you're missing the point here.  The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer.  Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms.  This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering.

Well, at some point compiler *has* to insert runtime library calls.
This is true for all proposals, both existing and potential ones. Do
you mean that runtime calls must be inserted either strictly before
LLVM optimizer or strictly after it -- no other place? More on this
later.

As for treating IR with/without OpenMP intrinsics as separate forms,
this is a matter of personal taste and design choice, I guess. Is
strength reduction (that replaces multiplications with additions)
transforms IR into another "form"?

> My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches:
>
> 1. Early procedurization (e.g. in the frontend that produces LLVM IR).  This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early.
>
> 2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4).  While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it.

I understand your point... and respectfully disagree with it.

You basically say that it is all or nothing at all: either *no*
optimizations on parallel code (runtime calls inserted before LLVM
optimizer), or *all* optimizations workable on parallel code (calls
inserted after LLVM optimizer). In former case we lose *all*
optimizations, not some. As for latter, I share your skepticism -- and
duplicate it.

> I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model.

Yes and yes.

And one more: "the proof is in the pudding", or so they say. Intel
Compiler (that, as you correctly noted, uses essentially the same
design) is the metaphorical "pudding" that proves viability and good
performance potential of the approach we proposed.

> I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature.

Hmmm... One has to analyze Intel Compiler bugs statistics to make this
assertion, but this is certainly not being impression.

Yours,
Andrey
---
Software Engineer
Intel Compiler Team
Intel Corp.

On Wed, Oct 3, 2012 at 9:26 AM, Chris Lattner <clattner at apple.com> wrote:
> On Oct 2, 2012, at 3:09 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote:
>> Chris,
>>
>>> My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata.  I can't imagine a way to make this work *correctly* without massive changes to the optimizer.
>>
>> There are three ways to make this work correctly:
>>
>> 1) Ignore OpenMP-related intrinsics and associated metadata.  Least
>> effort, least benefit (no OpenMP support).
>
> This is trivially true, but the entire point of supporting OpenMP in the IR would be to have some sort of late "procedurization" pass that actually exposes the parallelism through some runtime.  Saying that we could just ignore this is silly: if we wanted to ignore OpenMP, we can do that in the frontend with far less complexity.  In fact, we're already done! ;-)
>
>> 2) Make procedurization (including all runtime calls -- no intrinsics
>> left after this step) at the very start of LLVM optimizer. No changes
>> to optimizations, but no opportunity to optimize parallel code. As
>> cheap and easy as one can do to support OpenMP. This might be a good
>> choice for initial implementation.
>>
>> 3) Do some carefully chosen optimizations before procedurization. Do
>> heavylifting (like loop restructuring optimizations) after
>> procedurization. Some effort, a lot of benefit. This is essentially
>> what is described in [Tian05] (referenced in our proposal).
>
> I think you're missing the point here.  The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer.  Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms.  This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering.
>
> My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches:
>
> 1. Early procedurization (e.g. in the frontend that produces LLVM IR).  This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early.
>
> 2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4).  While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it.
>
>> 4) Make all optimizations thread-aware. Best approach in theory, no
>> compilers exist that go as far.
>
> It's not clear to me exactly what sorts of optimizations that late procedurization is attempting to allow.  I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model.  However, the technical benefits of this design are not clear to me, and I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature.  This is exactly the sort of thing that I want to avoid in LLVM.
>
> -Chris