[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
tobias at grosser.es
Fri Dec 30 01:09:39 PST 2011
On 12/29/2011 06:32 PM, Hal Finkel wrote:
> On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote:
>> On 12/14/2011 01:25 AM, Hal Finkel wrote:
>> One thing that I would still like to have is a test case where
>> bb-vectorize-search-limit is needed to avoid exponential compile time
>> growth and another test case that is not optimized, if
>> bb-vectorize-search-limit is set to a value less than 4000. I think
>> those cases are very valuable to understand the performance behavior of
>> this code.
> Good idea, I'll add these test cases.
>> Especially, as I am not yet sure why we need a value as high
>> as 4000.
> I am not exactly sure why that turned out to be the best number, but
> I'll try this again in combination with my load/store reordering patch
> and see if such a large value still seems best.
They reason why I am surprised about this value, is that I believe
partial loop unrolling would not yield bbs of this size. Code size
limits should prevent size. However, loop unrolling seems to be the
major reason why two accesses to adjacent memory may be placed far away.
Without loop unrolling, at a distance of 4000 the fact that two
instructions access adjacent memory locations seems to be completely
random and the probability that the following instructions perform the
same calculations seems low. Also, I believe at 4000 the compile time
should already be significant higher.
As it seems my intuition is wrong, I am very eager to see and understand
an example where a search limit of 4000 is really needed.
More information about the llvm-dev