<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 14, 2015 at 12:21 PM, Wei Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">>> Another point is that if vectorization is turned off, the runtime<br>

>> check will be gone. It doesn't make sense to depend on vectorization<br>

>> always being turned on.<br>

><br>

> This was a bug some time ago; I think it has been fixed now. The vectorizer will always potentially unroll regardless of whether it is allowed to do any actual vectorization.<br>

><br>

<br>

</span>If VF==1, unroll will still be tried. But if -fno-vectorize is used,<br>

no vectorization and no unroll will be done in loop vectorizer. I<br>

verified it using the testcase in<br>

<a href="https://llvm.org/bugs/show_bug.cgi?id=23217" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=23217</a></blockquote><div><br></div><div>A side note: Longer term, I think the alias based loop versioning should be done as separate enabler pass. Interleaving unroller, vectorizer, instruction scheduler are passes enabled/enhanced by it.</div><div><br></div><div> David</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<span class=""><br>

>><br>

>> I didn't see performance regressions in spec2000 and our internal<br>

>> benchmarks after applying this patch on x86, but it is possible that<br>

>> is because apps are not performance sensitive to compiler scheduling<br>

>> since x86 is out of order. So maybe the patch at least makes sense<br>

>> for<br>

>> x86 for now?<br>

><br>

> Agreed; you need to be careful here, the vectorizer's unrolling (interleaving) transformation gives must greater speedups on simpler cores with longer pipelines. X86 is much less sensitive to this, at least the server-level cores (atom, silvermont, etc. might be different).<br>

><br>

> Doing this during scheduling sounds nice in theory, but making the decision in the scheduler might be even harder than it is here. The scheduler does not really know anything about loops, and does not make speculative scheduling decisions. For the scheduler to make a decision about inserting runtime checks, it would need both capabilities, and making speculative schedules to evaluate the need for runtime checks could get very (compile-time) expensive. In addition, you really want other optimizations to fire after the checks are inserted, which is not possible if you insert them very late in the pipeline.<br>

><br>

> All of this having been said, the interleaved unrolling should, generally speaking, put less pressure on the reorder buffer(s), and should be preferable to the concatenation unrolling done by the regular unroller. Furthermore, they should both fire if the interleaved unrolling still did not make the loop large enough. Why is this not happening?<br>

<br>

</span>It is happening (The interleaved unrolling and regular unroller both<br>

fired). But it is not perf efficient.<br>

<br>

after interleaved unrolling, the original loop becomes:<br>

    overflow check block + memcheck block + kernel loop unrolled by 2<br>

+ remainder loop.<br>

then regular unroll loop further convert it to:<br>

    overflow check block + memcheck block + prologue loop for kernel<br>

loop + kernel loop unrolled by 4 + prologue loop for remainder loop +<br>

remainder loop unrolled by 4.<br>

<br>

For x86, since the extra overflow check block and memcheck block have<br>

extra cost, I inclined to remove the unrolling in vectorization on<br>

x86, and let regular unroller do all the jobs. For other<br>

architectures, it may be better to adjust the unrolling cost model in<br>

loop vectorization and let it finish the unroll job all at once, to<br>

remove the extra prologue loop costs. Does it make sense?<br>

<br>

Thanks,<br>

Wei.<br>

<div class="HOEnZb"><div class="h5"><br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div></div>