[LLVMdev] Trip count and Loop Vectorizer
aschwaighofer at apple.com
Fri Sep 27 12:56:20 PDT 2013
We have frame work for simple heuristics in place:
1.) We vectorize loops with unknown trip count.
2.) You can find our partial unroll heuristic in selectUnrollFactor. It probably needs tuning.
// We unroll the loop in order to expose ILP and reduce the loop overhead.
// There are many micro-architectural considerations that we can't predict
// at this level. For example frontend pressure (on decode or fetch) due to
// code size, or the number and capabilities of the execution ports.
// We use the following heuristics to select the unroll factor:
// 1. If the code has reductions the we unroll in order to break the cross
// iteration dependency.
// 2. If the loop is really small then we unroll in order to reduce the loop
<<< This is the heuristic that works against your example.
// 3. We don't unroll if we think that we will spill registers to memory due
// to the increased register pressure.
On Sep 27, 2013, at 2:21 PM, Murali, Sriram <sriram.murali at intel.com> wrote:
> Hey Arnold,
> I have run into this situation many times while benchmarking.
> I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though.
> I want to run it by the list to get any ideas floating around :)
> -----Original Message-----
> From: Arnold Schwaighofer [mailto:aschwaighofer at apple.com]
> Sent: Friday, September 27, 2013 1:54 PM
> To: Murali, Sriram
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Trip count and Loop Vectorizer
> On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>> so you could infer that n must be smaller than 8 (because you know the range of the other dimension). The question is how often does such an example occur, where this is possible, to make such an effort justifiable?
> smaller equal, of course ;)
More information about the llvm-dev