[llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp LoopVectorize.h

Wed Jan 2 18:39:59 PST 2013

On Wed, Jan 2, 2013 at 6:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Nadav Rotem" <nrotem at apple.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: llvm-commits at cs.uiuc.edu
>> Sent: Wednesday, January 2, 2013 7:55:33 PM
>> Subject: Re: [llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
>> LoopVectorize.h
>>
>>
>>
>>
>>
>> On Jan 2, 2013, at 5:12 PM, Hal Finkel < hfinkel at anl.gov > wrote:
>>
>>
>> Interesting. Can you please explain your motivation for doing this?
>>
>>
>>
>> Hi Hal!
>>
>>
>> The loop vectorizer can now generate multiple vectors for each scalar
>> instruction. You are right that we could have used the loop unrolled
>> for some cases. Basically we could have duplicated the loop basic
>> block and added a new kind of alias analysis to tell the scheduler
>> that memory operations from consecutive iterations do not alias.
>
> We might want to do this anyway to help the instruction scheduler, but that's another story.
>
>> However, this approach would fail for code such as this one:
>>
>>
>> for (int i = 0; i < n; ++i)
>> sum += A[i];
>>
>> The 'sum' variable is a reduction variable. In order to increase ILP
>> we'd like to have two variables that accumulate the content of A.
>> The LoopVectorizer has all of the information and infrastructure to
>> allow the partial unrolling of loops.
>> Maybe the name 'unrolling' is misleading. We can think of it as wider
>> vectors that are somehow split to legal register sizes.
>
> Okay, I understand, thanks! The loop unroller would just create one large dependency chain, but to increase ILP, we need several chains. On the other hand, would it make more sense to teach the unroller to split reduction dependency chains than to embed this functionality in the vectorizer? It seems like this transformation would be useful even in cases where we are not actually vectorizing. Conversely, if the vectorizer is, for specialized cases, a better unroller than the unroller, then maybe we should specifically make sure it can be used that way.

This transformation is basically orthogonal to anything the current
LLVM IR loop unroller pass knows how to do: unlike the vectorizer, the
unroller always executes all the loop iterations in the same order
they ran before the unrolling.

>>
>>
>> The next step would be to write code that calculates the register
>> pressure in order to estimate the profitability of this
>> transformation.
>
> Sounds good. We may need something like this for the regular unroller as well.

Do we?  I mean, if we can't vectorize a loop, the only reason to
unroll it at the IR level is if the IR subsequently simplifies, and
that doesn't really depend on register pressure.  We can easily
perform simple unrolling at the MachineFunction level, and we have
much better information at that point.  (I'm using the term
"vectorize" loosely here to mean loops where we can perform
vectorization-style unrolling, even if there aren't any vector
instructions involved.)

-Eli