[LLVMdev] RFC: [PATCH] parallel loop metadata

Mon Feb 4 10:41:01 PST 2013

----- Original Message -----
> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Nadav Rotem" <nrotem at apple.com>, "Tobias Grosser" <tobias at grosser.es>, "CVS Commit Messages for LLVM repository"
> <llvm-commits at cs.uiuc.edu>, "Sebastian Pop" <spop at codeaurora.org>, "LLVM Developers Mailing List"
> <llvmdev at cs.uiuc.edu>
> Sent: Monday, February 4, 2013 12:18:09 PM
> Subject: Re: [LLVMdev] RFC: [PATCH] parallel loop metadata
> 
> Hello all,
> 
> Thanks for the comments. Attached is a new version with
> Tobias' and Sebastian's (final?) comments addressed. Any
> further comments are appreciated.
> 
> Nadav suggested a request for comments in llvmdev before committing
> it.
> 
> In order to describe the current idea of the parallel loop metadata,
> I think it's easiest to just copy-paste the documentation I wrote for
> this patch so one can inline-reply to it with the possible comments:

Perhaps, although it might be helpful to provide a little bit of context. Essentially, we should say upfront that 'parallel' here does not imply any particular set of parallel-execution semantics. Rather, this is simply a statement about the lack of inter-iteration dependencies.

> 
> +'``llvm.loop``'
> +^^^^^^^^^^^^^^^
> +
> +It is sometimes useful to attach information to loop constructs.
> Currently,
> +loop metadata is implemented as metadata attached to the branch
> instruction
> +in the loop latch block. This type of metadata refer to a metadata
> node that is
> +guaranteed to be separate for each loop. The loop-level metadata is
> prefixed
> +with ``llvm.loop``.
> +
> +The loop identifier metadata is implemented using a metadata that
> refers to
> +itself as follows:
> +
> +.. code-block:: llvm
> +    !0 = metadata !{ metadata !0 }
> +
> +'``llvm.loop.parallel``' Metadata
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +This loop metadata can be used to communicate that a loop should be
> considered
> +a parallel loop. The semantics of parallel loops in this case is the
> one
> +with the strongest cross-iteration instruction ordering freedom: the
> +iterations in the loop can be considered completely independent of
> each
> +other (also known as embarrassingly parallel loops).
> +
> +This metadata can originate from a programming language with
> parallel loop
> +constructs. In such a case it is completely the programmer's
> responsibility
> +to ensure the instructions from the different iterations of the loop
> can be
> +executed in an arbitrary order, in parallel, or intertwined. No
> loop-carried
> +dependency checking at all must be expected from the compiler.
> +
> +In order to fulfill the LLVM requirement for metadata to be safely
> ignored,
> +it is important to ensure that a parallel loop is converted to
> +a sequential loop in case an optimization (agnostic of the parallel
> loop
> +semantics) converts the loop back to such. This happens when new
> memory
> +accesses that do not fulfill the requirement of free ordering across
> iterations
> +are added to the loop. Therefore, this metadata is required, but not
> +sufficient, to consider the loop at hand a parallel loop. For a loop
> +to be parallel,  all its memory accessing instructions need to be
> +marked with the ``llvm.mem.parallel_loop_access`` metadata that
> refer
> +to the same loop identifier metadata that identify the loop at hand.
> +
> +'``llvm.mem``'
> +^^^^^^^^^^^^^^^
> +
> +Metadata types used to annotate memory accesses with information
> helpful
> +for optimizations are prefixed with ``llvm.mem``.
> +
> +'``llvm.mem.parallel_loop_access``' Metadata
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +For a loop to be parallel, in addition to using
> +the ``llvm.loop.parallel`` metadata to mark the loop latch branch
> instruction,
> +also all of the memory accessing instructions in the loop body need
> to be
> +marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
> +is at least one memory accessing instruction not marked with the
> metadata,
> +the loop, despite it possibly using the ``llvm.loop.parallel``
> metadata,
> +must be considered a sequential loop. This causes parallel loops to
> be
> +converted to sequential loops due to optimization passes that are
> unaware of
> +the parallel semantics and that insert new memory instructions to
> the loop
> +body.
> +
> +Example of a loop that is considered parallel due to its correct use
> of
> +both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
> +metadata types that refer to the same loop identifier metadata.
> +
> +.. code-block:: llvm
> +
> +   for.body:
> +   ...
> +   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
> !0
> +   ...
> +   store i32 %0, i32* %arrayidx4, align 4,
> !llvm.mem.parallel_loop_access !0
> +   ...
> +   br i1 %exitcond, label %for.end, label %for.body,
> !llvm.loop.parallel !0
> +
> +   for.end:
> +   ...
> +   !0 = metadata !{ metadata !0 }
> +
> +It is also possible to have nested parallel loops. In that case the
> +memory accesses refer to a list of loop identifier metadata nodes
> instead of
> +the loop identifier metadata node directly:
> +
> +.. code-block:: llvm
> +
> +   outer.for.body:
> +   ...
> +
> +   inner.for.body:
> +   ...
> +   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
> !0
> +   ...
> +   store i32 %0, i32* %arrayidx4, align 4,
> !llvm.mem.parallel_loop_access !0
> +   ...
> +   br i1 %exitcond, label %inner.for.end, label %inner.for.body,
> !llvm.loop.parallel !1
> +
> +   inner.for.end:
> +   ...
> +   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
> !0
> +   ...
> +   store i32 %0, i32* %arrayidx4, align 4,
> !llvm.mem.parallel_loop_access !0
> +   ...
> +   br i1 %exitcond, label %outer.for.end, label %outer.for.body,
> !llvm.loop.parallel !2
> +
> +   outer.for.end:                                          ; preds =
> %for.body
> +   ...
> +   !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel
> loop
> identifiers
> +   !1 = metadata !{ metadata !1 } ; an identifier for the inner
> parallel loop
> +   !2 = metadata !{ metadata !2 } ; an identifier for the outer
> parallel loop
> 
> 
> Nadav Rotem wrote:
> >> Pekka,
> >>
> >> Can you please write llvm-dev and describe your parallel metadata
> >> ? This
> >> is a major change to LLVM and it has the potential to affect every
> >> single
> >> pass in LLVM.  It should be reviewed by the community before we
> >> discuss
> >> the patch.  Similar proposals were repelled in the past (remember
> >> the
> >> OpenMP "propeller" discussion ?).
> 
> I tried to participate in the beginning and then the discussion
> somehow
> died (and I got busy with something else).
> 
> IIRC the conclusion for OpenMP was that the thread libraries should
> be
> called directly in the frontend to implement thread level parallelism
> for OpenMP programs, but was the destiny of the generic "parallelism
> info"
> metadata also decided to be a dead end? I didn't get that part.
> 
> Hal Finkel then replied:
> > I agree with Nadav that community input on this proposal would be
> > good.
> > There are a number of passes that may need to be made aware of this
> > metadata, and some consensus and planning is likely necessary for
> > this to
> > be successful. Nevertheless, this seems necessary for enabling
> > generally
> > useful features (like #pragma ivdep/parallel, etc.) and is worth
> > the
> > effort.
> 
> I agree. However, I think the proposed metadata is now well
> "ignorable" (thanks to the Renato Golin's suggestion to mark the mem
> accesses too). Sequential execution (to which it falls back to) is
> a legal execution ordering of a fully parallel loop.
> 
> As soon as someone points out what is the true difference (basically
> the set of guaranteed analysis done by the compiler) between "ivdep"
> and
> "parallel" then we can add a new metadata type, e.g., the
> llvm.loop.ignore_assumed_deps to support that. Anyways, my desire is
> to start
> from something as it's easier to build on existing foundations.

As various people have expressed, "the set of guaranteed analysis done by the compiler" is the null set, and so this should suffice for these pragmas as well. Nevertheless, it is my opinion that we should run the dependence analysis anyway, if for no other reason than to warn the user of potential problems (but this can be considered an enhancement, not a prerequisite to the initial implementation).

> 
> > To clarify history, the reason that the metadata-based OpenMP
> > schemes died
> > was not due to the "propeller" arguments (which said that
> > parallelization
> > semantics are just not a good fit for LLVM), but rather due to
> > fundamental
> > correctness issues specific to OpenMP. Specifically, there are
> > cases where
> > it would not be legal to drop metadata that produced parallel
> > regions, and
> > a fundamental design principle of metadata is that dropping it must
> > not
> > cause miscompilation. It was decided to move ahead with frontend
> > lowering
> > under the assumption that the backend would either be provided
> > with, or
> > could itself deduce, the information necessary for later
> > optimization where
> > appropriate. This kind of metadata fits into that general plan.
> 
> OK, this was my view of how the OpenMP discussion ended as well. What
> were the correctness issues in case of OpenMP, BTW?

Off of the top of my head, there were at least two issues:

1. num_threads could be specified, and this was not legally ignorable. The problem is that various runtime parameters can be queried using the runtime library, and those can be used to determine a legal number of threads to use. As a result, it is not legal to simply drop the parallel region.

2. There were similar issues with runtime scheduling; interactions between the runtime library and the pragmas could make dropping the pragmas illegal.

This proposal has no predefined associated semantics and it just used for dependency determination, and thus does not have these issues.

Thanks again,
Hal

> 
> Thanks,
> --
> Pekka
>