[LLVMdev] RFC: [PATCH] parallel loop metadata

Mon Feb 4 10:18:09 PST 2013

Hello all,

Thanks for the comments. Attached is a new version with
Tobias' and Sebastian's (final?) comments addressed. Any
further comments are appreciated.

Nadav suggested a request for comments in llvmdev before committing it.

In order to describe the current idea of the parallel loop metadata,
I think it's easiest to just copy-paste the documentation I wrote for
this patch so one can inline-reply to it with the possible comments:

+'``llvm.loop``'
+^^^^^^^^^^^^^^^
+
+It is sometimes useful to attach information to loop constructs. Currently,
+loop metadata is implemented as metadata attached to the branch instruction
+in the loop latch block. This type of metadata refer to a metadata node that is
+guaranteed to be separate for each loop. The loop-level metadata is prefixed
+with ``llvm.loop``.
+
+The loop identifier metadata is implemented using a metadata that refers to
+itself as follows:
+
+.. code-block:: llvm
+    !0 = metadata !{ metadata !0 }
+
+'``llvm.loop.parallel``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This loop metadata can be used to communicate that a loop should be considered
+a parallel loop. The semantics of parallel loops in this case is the one
+with the strongest cross-iteration instruction ordering freedom: the
+iterations in the loop can be considered completely independent of each
+other (also known as embarrassingly parallel loops).
+
+This metadata can originate from a programming language with parallel loop
+constructs. In such a case it is completely the programmer's responsibility
+to ensure the instructions from the different iterations of the loop can be
+executed in an arbitrary order, in parallel, or intertwined. No loop-carried
+dependency checking at all must be expected from the compiler.
+
+In order to fulfill the LLVM requirement for metadata to be safely ignored,
+it is important to ensure that a parallel loop is converted to
+a sequential loop in case an optimization (agnostic of the parallel loop
+semantics) converts the loop back to such. This happens when new memory
+accesses that do not fulfill the requirement of free ordering across iterations
+are added to the loop. Therefore, this metadata is required, but not
+sufficient, to consider the loop at hand a parallel loop. For a loop
+to be parallel,  all its memory accessing instructions need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata that refer
+to the same loop identifier metadata that identify the loop at hand.
+
+'``llvm.mem``'
+^^^^^^^^^^^^^^^
+
+Metadata types used to annotate memory accesses with information helpful
+for optimizations are prefixed with ``llvm.mem``.
+
+'``llvm.mem.parallel_loop_access``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For a loop to be parallel, in addition to using
+the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+also all of the memory accessing instructions in the loop body need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
+is at least one memory accessing instruction not marked with the metadata,
+the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
+must be considered a sequential loop. This causes parallel loops to be
+converted to sequential loops due to optimization passes that are unaware of
+the parallel semantics and that insert new memory instructions to the loop
+body.
+
+Example of a loop that is considered parallel due to its correct use of
+both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
+metadata types that refer to the same loop identifier metadata.
+
+.. code-block:: llvm
+
+   for.body:
+   ...
+   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+
+   for.end:
+   ...
+   !0 = metadata !{ metadata !0 }
+
+It is also possible to have nested parallel loops. In that case the
+memory accesses refer to a list of loop identifier metadata nodes instead of
+the loop identifier metadata node directly:
+
+.. code-block:: llvm
+
+   outer.for.body:
+   ...
+
+   inner.for.body:
+   ...
+   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   br i1 %exitcond, label %inner.for.end, label %inner.for.body, 
!llvm.loop.parallel !1
+
+   inner.for.end:
+   ...
+   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+   ...
+   br i1 %exitcond, label %outer.for.end, label %outer.for.body, 
!llvm.loop.parallel !2
+
+   outer.for.end:                                          ; preds = %for.body
+   ...
+   !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop 
identifiers
+   !1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
+   !2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop

Nadav Rotem wrote:
>> Pekka,
>>
>> Can you please write llvm-dev and describe your parallel metadata ? This
>> is a major change to LLVM and it has the potential to affect every single
>> pass in LLVM.  It should be reviewed by the community before we discuss
>> the patch.  Similar proposals were repelled in the past (remember the
>> OpenMP "propeller" discussion ?).

I tried to participate in the beginning and then the discussion somehow
died (and I got busy with something else).

IIRC the conclusion for OpenMP was that the thread libraries should be
called directly in the frontend to implement thread level parallelism
for OpenMP programs, but was the destiny of the generic "parallelism info"
metadata also decided to be a dead end? I didn't get that part.

Hal Finkel then replied:
> I agree with Nadav that community input on this proposal would be good.
> There are a number of passes that may need to be made aware of this
> metadata, and some consensus and planning is likely necessary for this to
> be successful. Nevertheless, this seems necessary for enabling generally
> useful features (like #pragma ivdep/parallel, etc.) and is worth the
> effort.

I agree. However, I think the proposed metadata is now well
"ignorable" (thanks to the Renato Golin's suggestion to mark the mem
accesses too). Sequential execution (to which it falls back to) is
a legal execution ordering of a fully parallel loop.

As soon as someone points out what is the true difference (basically
the set of guaranteed analysis done by the compiler) between "ivdep" and 
"parallel" then we can add a new metadata type, e.g., the 
llvm.loop.ignore_assumed_deps to support that. Anyways, my desire is to start
from something as it's easier to build on existing foundations.

> To clarify history, the reason that the metadata-based OpenMP schemes died
> was not due to the "propeller" arguments (which said that parallelization
> semantics are just not a good fit for LLVM), but rather due to fundamental
> correctness issues specific to OpenMP. Specifically, there are cases where
> it would not be legal to drop metadata that produced parallel regions, and
> a fundamental design principle of metadata is that dropping it must not
> cause miscompilation. It was decided to move ahead with frontend lowering
> under the assumption that the backend would either be provided with, or
> could itself deduce, the information necessary for later optimization where
> appropriate. This kind of metadata fits into that general plan.

OK, this was my view of how the OpenMP discussion ended as well. What
were the correctness issues in case of OpenMP, BTW?

Thanks,
-- 
Pekka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-3.3-parallel-loop-metadata.patch
Type: text/x-patch
Size: 18027 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130204/60bd1f39/attachment.bin>