[LLVMdev] [PATCH] parallel loop metadata

Mon Feb 4 07:42:52 PST 2013

On 02/04/2013 02:23 PM, Pekka Jääskeläinen wrote:
> Hi,
>
> Here's an updated patch with the remarks from Tobias and Sebastian
> addressed.

Great.

> On 02/04/2013 11:30 AM, Pekka Jääskeläinen wrote:
>> branch itself). That should do (if it works), unless you have a better
>> idea for this.
>
> I ended up using the trick TBAA uses to create metadata that won't
> get merged (thus remains to be referred to only by one loop's metadata)
> by using a metadata that refers to itself.
>
> It also now supports nested parallel loops via referring to a metadata that
> points to all the loop identifier metadatas the memory accessing
> instruction is inside of.

Nice.

I looked at the patch. It looks very nice and is from my side good to 
go. I just added some minor comments that you may want to address before 
committing.

Tobi

P.S.: Please give Sebastian and Nadav also a couple of days to comment.

> Index: llvm/include/llvm/Analysis/LoopInfo.h
> ===================================================================
> --- llvm.orig/include/llvm/Analysis/LoopInfo.h	2013-02-04 11:35:13.036349427 +0200
> +++ llvm/include/llvm/Analysis/LoopInfo.h	2013-02-04 11:37:29.428349395 +0200
> @@ -381,6 +381,20 @@
>     /// isSafeToClone - Return true if the loop body is safe to clone in practice.
>     bool isSafeToClone() const;
>
> +  /// Returns true if the loop is parallel.
> +  ///
> +  /// A parallel loop can be assumed to not contain any dependencies between
> +  /// iterations by the compiler. That is, any loop-carried dependency checking
> +  /// can be skipped completely when parallelizing the loop on the target
> +  /// machine. Thus, if the parallel loop information originates from the
> +  /// programmer, e.g. via the OpenMP parallel for pragma, it is the
> +  /// programmer's responsibility to ensure the are no loop-carried
                                                there are

> +  /// dependencies. The final execution order of the instructions across
> +  /// iterations is not guaranteed, thus, the end result might or might
'might or might', that sounds incorrect

> +  /// implement actual concurrent execution of instructions across multiple
> +  /// iterations.
> +  bool isParallel() const;
> +
>     /// hasDedicatedExits - Return true if no exit block for the loop
>     /// has a predecessor that is outside the loop.
>     bool hasDedicatedExits() const;
> Index: llvm/lib/Analysis/LoopInfo.cpp
> ===================================================================
> --- llvm.orig/lib/Analysis/LoopInfo.cpp	2013-02-04 11:35:13.096349325 +0200
> +++ llvm/lib/Analysis/LoopInfo.cpp	2013-02-04 15:11:11.039288169 +0200
> @@ -24,6 +24,7 @@
>   #include "llvm/Assembly/Writer.h"
>   #include "llvm/IR/Constants.h"
>   #include "llvm/IR/Instructions.h"
> +#include "llvm/IR/Metadata.h"
>   #include "llvm/Support/CFG.h"
>   #include "llvm/Support/CommandLine.h"
>   #include "llvm/Support/Debug.h"
> @@ -233,6 +234,51 @@
>     return true;
>   }
>
> +
> +bool Loop::isParallel() const {
> +
> +  BasicBlock *latch = getLoopLatch();
> +  if (latch == NULL ||
> +      latch->getTerminator()->getMetadata("llvm.loop.parallel") == NULL)
> +    return false;

You can probably move the second part of the condition later.

if (latch == NULL)
   return false;

MDNode *desiredLoopIdMetadata =
     latch->getTerminator()->getMetadata("llvm.loop.parallel");

if (!desiredLoopIdMetadata)
   return false;

> +
> +  MDNode *desiredLoopIdMetadata =
> +      latch->getTerminator()->getMetadata("llvm.loop.parallel");
> +
> +  // The loop branch contains the parallel loop metadata. In order to ensure
> +  // that any parallel-loop-unaware optimization pass hasn't added loop-carried
> +  // dependencies (thus converted the loop back to a sequential loop), check
> +  // that all the memory instructions in the loop contain parallelism metadata
> +  // that point to the same unique "loop id metadata" the loop branch does.
> +  for (block_iterator BB = block_begin(), BE = block_end(); BB != BE; ++BB) {
> +    for (BasicBlock::iterator II = (*BB)->begin(), EE = (*BB)->end();
> +         II != EE; II++) {
> +
> +      if (!II->mayReadOrWriteMemory()) continue;

clang-format would put the single statement on the next line.

> +
> +      if (II->getMetadata("llvm.mem.parallel_loop_access") == NULL) {
> +        return false;
> +      }

No '{' or '}' for single statement loops.

> +      // The memory instruction can refer to the loop identifier metadata
> +      // directly or indirectly through another list metadata (in case of
> +      // nested parallel loops). The loop identifier metadata refers to
> +      // itself so we can check both cases with the same routine.
> +      MDNode *loopIdMD =
> +          dyn_cast<MDNode>(II->getMetadata("llvm.mem.parallel_loop_access"));
> +      bool loopIdMDFound = false;
> +      for (unsigned i = 0; i < loopIdMD->getNumOperands(); ++i) {

We normally move fuction calls out of the exit condition:

      for (unsigned i = 0, e = loopIdMD->getNumOperands(); i < e; ++i)

> +          if (loopIdMD->getOperand(i) == desiredLoopIdMetadata) {
> +              loopIdMDFound = true;
> +              break;
> +          }
> +      }
> +      if (!loopIdMDFound) return false;

I would put this on the next line

> Index: llvm/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ llvm/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll	2013-02-04 14:01:29.556598687 +0200
> @@ -0,0 +1,39 @@
> +; RUN: opt < %s -reg2mem -loop-vectorize -force-vector-unroll=1 -force-vector-width=4 -dce -instcombine -S | FileCheck %s

Instead of running '-reg2mem', I would just store the output of reg2mem 
in this test case. This keeps the input that we test unaffected from 
future changes to reg2mem. You can still keep the comment, that the 
input was generated or models reg2mem.

> Index: llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll	2013-02-04 15:10:59.318952299 +0200
> @@ -0,0 +1,77 @@
> +; RUN: opt < %s  -loop-vectorize -force-vector-unroll=1 -force-vector-width=4 -dce -instcombine -S | FileCheck %s
> +
> +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> +; A tricky loop:
> +;
> +; void loop(int *a, int *b) {
> +;    for (int i = 0; i < 512; ++i) {
> +;        a[a[i]] = b[i];
> +;        a[i] = b[i+1];
> +;    }
> +;}

Nice.

> +for.end:                                          ; preds = %for.body
> +  ret void
> +}
> +
> +; The same loop with parallel loop metadata added to the loop branch
> +; and the memory instructions.

Nice.

Could you also add a test case where the metadata of the loads/stores 
references the wrong parallel loop metadata.

Thanks again,

Tobi