[LLVMdev] Parallel Loop Metadata

Fri Feb 8 01:56:41 PST 2013

On 02/08/2013 06:35 AM, Nadav Rotem wrote:
> Hi Tobi,
>
> Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly.

Sure. I am interested in using it.

>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct?
>
> Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about.

OK. I see. Most of the references that -mem2reg introduces are obviously
destroying parallelism and I can see that the loop vectorizer could 
detect these obvious violations and refuse to parallelize. However, the
question remains if the loop vectorizer can (and should) detect all 
possible violations. I am still concerned that this is in general not
possible. Here another piece of code (+ transformation), where it is
a lot harder (impossible?) to detect the violation:

// b is always bigger than 100
#parallel
for (int i = 0; i < 100; i++) {
S1:   A[i % b] += i;
}

Depending on the values of 'b' the loop is either parallel or not. It is 
impossible for the -loop-vectorizer to reason about this, but with
the additional information the user has, he can easily annotate the loop 
such that the loop vectorizer can optimize it.

Now we have some new instrumentation pass, which collects information
and uses for this a buffer with 'Size' elements. The instrumentation 
pass just adds a couple of additional instructions, which do not
change the sequential behavior of the program.

int *B = get_buffer();
int Size = get_buffer_size();

// b is always bigger than 100
#parallel
for (int i = 0; i < 100; i++) {
s1:   A[i % b] += i;
S2:   B[i % Size] += i
}

However, depending on the value of Size, the parallel execution of
the updated loop may not be legal any more. Without further information 
we have to assume that the loop is not parallel anymore.

For this case, will we require the loop-vectorizer to detect the 
outdated metadata? In case we do, how could this work?

Or do we require the instrumentation pass, to reason about the 
llvm.loop.parallel data and remove it in case it gets invalidated?

Or can we just assume that such an instrumentation pass can or will 
never exist?

Cheers
Tobi

P.S: I know that due to the modulo, we will not be able to prove stride 
one access and vectorization may not be profitable. The example was 
written to demonstrate legality issues. We can assume surrounding
instructions which would make vectorization profitable.