[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

Mon Feb 18 20:18:22 PST 2013

----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Monday, February 18, 2013 6:31:39 PM
> Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
> 
> > 
> > Okay. If you'll update your local BBVectorize patches, then we can
> > pull them upstream. Then we'll just need to update the unroller.
> 
> 
> If I understand this thread correctly, you want to enable
> vecorization by telling the BB vectorizer that different operations
> are independent. I understand your motivation and I agree that this
> is indeed one way to do vectorization.  However, I don't completely
> understand something.  If we already have the information that
> consecutive iterations of the loops are independent, then the loop
> vectorizer should already vectorize the loop.  Also, at the moment
> we unroll loops before BB Vectorization. Can you think of cases
> where the unrolling can help BB-vecoriation ? I think that it can
> only help in cases that are easily detected by the loop vectorizer.

I think this is more a question of profitability more than ability. If we mark all loop iterations as independent, then the loop vectorizer could vectorize them, but it might not find it profitable to do so. Nevertheless, it might be profitable to partially vectorize the loop, and the unroll+bb-vectorize approach can catch those cases. Note that the use of this metadata on loads is not just to vectorize those particular loads, but also to provide a means of proving the independence of their users.

In any case, I really want this iteration-independence metadata after unrolling to assist with instruction scheduling on in-order cores (the enable-aa-sched-mi option). So long as we have it, BBVectorize might as well support it, but allowing the instruction scheduler to hide the load latencies is really my key use case.

Thanks again,
Hal

> 
> Thanks,
> Nadav
>