[PATCH] Don't unroll loops in loop vectorization pass when VF is one.

Wei Mi wmi at google.com
Wed Apr 15 09:59:39 PDT 2015


On Tue, Apr 14, 2015 at 3:16 PM, Wei Mi <wmi at google.com> wrote:
>>> For x86, since the extra overflow check block and memcheck block have
>>> extra cost, I inclined to remove the unrolling in vectorization on
>>> x86, and let regular unroller do all the jobs. For other
>>> architectures, it may be better to adjust the unrolling cost model in
>>> loop vectorization and let it finish the unroll job all at once, to
>>> remove the extra prologue loop costs. Does it make sense?
>>
>> It makes sense; how does this compare, performance-wise, to other options. For example, what happens if you force the vectorizer to unroll by 4x? The main difference is the cost of the memory-overlap checking, right? I agree that avoiding the memory checks makes sense when the expected benefit from them is low.
>
> I will try that and get back.
>
> Thanks,
> Wei.

I tried a new option containing two changes:
1. Adjust MaxInterleaveSize in X86TTIImpl::getMaxInterleaveFac from 2
to 4. That is the reason why some testcases only unroll by a factor of
two while regular unroller will unroll by four before.
2. Mark loop unrolled in loop vectorizer and remainder loop generated
to be "llvm.loop.unroll.disable" with metadata, so regular loop will
not unroll such loops once more.

I tested internal benchmarks with the new option. For an image
recognition benchmark which the original patch improved 5% on
sandybridge, the new option didn't get any improvement. I analyzed the
benchmark before so I understood the perf difference: The innerloop
unrolled didn't have many iterations while the outerloop was very hot,
so cost of the overflow check and memory bound check played a
significant role there.

I think it may still be good to disable unroll for remainder loop. I
will do that in successive patch after I do some more analysis to the
benchmarks.

Thanks,
Wei.




More information about the llvm-commits mailing list