[LLVMdev] Parallel Loop Metadata

Nadav Rotem nrotem at apple.com
Mon Feb 11 13:31:17 PST 2013


Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). 

Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than one way (detect or fix). The example pass that you mentioned below (the instrumentation pass), can be taught to handle the parallelism pragmas. 

Can you think of other passes that we will need to modify ? 

On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote:

> On 02/08/2013 06:35 AM, Nadav Rotem wrote:
>> Hi Tobi,
>> 
>> Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly.
> 
> Sure. I am interested in using it.
> 
>>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct?
>> 
>> Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about.
> 
> OK. I see. Most of the references that -mem2reg introduces are obviously
> destroying parallelism and I can see that the loop vectorizer could detect these obvious violations and refuse to parallelize. However, the
> question remains if the loop vectorizer can (and should) detect all possible violations. I am still concerned that this is in general not
> possible. Here another piece of code (+ transformation), where it is
> a lot harder (impossible?) to detect the violation:
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> S1:   A[i % b] += i;
> }
> 
> Depending on the values of 'b' the loop is either parallel or not. It is impossible for the -loop-vectorizer to reason about this, but with
> the additional information the user has, he can easily annotate the loop such that the loop vectorizer can optimize it.
> 
> Now we have some new instrumentation pass, which collects information
> and uses for this a buffer with 'Size' elements. The instrumentation pass just adds a couple of additional instructions, which do not
> change the sequential behavior of the program.
> 
> int *B = get_buffer();
> int Size = get_buffer_size();
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> s1:   A[i % b] += i;
> S2:   B[i % Size] += i
> }
> 
> However, depending on the value of Size, the parallel execution of
> the updated loop may not be legal any more. Without further information we have to assume that the loop is not parallel anymore.
> 
> For this case, will we require the loop-vectorizer to detect the outdated metadata? In case we do, how could this work?
> 
> Or do we require the instrumentation pass, to reason about the llvm.loop.parallel data and remove it in case it gets invalidated?
> 
> Or can we just assume that such an instrumentation pass can or will never exist?
> 
> Cheers
> Tobi
> 
> P.S: I know that due to the modulo, we will not be able to prove stride one access and vectorization may not be profitable. The example was written to demonstrate legality issues. We can assume surrounding
> instructions which would make vectorization profitable.




More information about the llvm-dev mailing list