[llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp LoopVectorize.h

Thu Jan 3 10:35:45 PST 2013

On Jan 2, 2013, at 8:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>> 
>>>>>> 
>>>>>> 
>>>>>> The next step would be to write code that calculates the
>>>>>> register
>>>>>> pressure in order to estimate the profitability of this
>>>>>> transformation.
>>>>> 
>>>>> Sounds good. We may need something like this for the regular
>>>>> unroller as well.
>>>> 
>>>> Do we?  I mean, if we can't vectorize a loop, the only reason to
>>>> unroll it at the IR level is if the IR subsequently simplifies,
>>>> and
>>>> that doesn't really depend on register pressure.  We can easily
>>>> perform simple unrolling at the MachineFunction level, and we have
>>>> much better information at that point.
>>> 
>>> Do we have anything that does that?
>> 
>> All the analysis infrastructure is there, but there isn't an actual
>> unroller at the moment as far as I know.
> 
> We have MachineLoopInfo, but that does not give us any way to determine trip counts, etc. Hexagon and PowerPC have "hardware loops" passes which optimize loop branching, and those passes need to use target-specific knowledge to analyze the comparison and increment instructions to extract the trip counts. Am I missing something?

I don't think you're missing anything. You're right that conceptually we may unroll for various reasons independent of vectorization. And given that we've decided to implement more target lowering at IR level, it makes sense to do partial unrolling here too. In fact, I think it's important to do all loop restructuring that may involve splitting the loop into pre/remainder loops as a single IR pass that runs very late (later than full unrolling).

Sum-reduction is a good example of an IR-level optimization that drives partial unrolling. Currently the loop vectorizer has the information needed for sum-reduction, so it's done here for convenience. So the loop vectorizer is no longer just a vectorizer--fine with me. Partial unrolling shouldn't necessarily run in the same pass as full unrolling anyway, that was also just a convenience. Over time we'll certainly add more IR-lowering loop transformations and more heuristics for partial unrolling. These transformations need to play nicely together and should share infrastructure. We haven't solved that software engineering challenge, but conceptually I think we all agree on the goal.

Andy