<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <<a href="mailto:evstupac@gmail.com" class="">evstupac@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Before the patch the loop<br class="">for (i = 0; i < 15; i++)<br class=""> loop_body;<br class="">was not unrolled,<br class=""><br class="">the loop<br class="">for (i = 0; i < 16; i++)<br class=""> loop_body;<br class="">was unrolled<br class=""><br class="">the loop<br class="">for (i = 0; i < n; i++)<br class=""> loop_body;<br class="">was unrolled<br class=""><br class="">Why we should avoid unrolling if threshold let us unroll a loop?<br class="">The sense of unrolling (right now) is to reduce induction variable and<br class="">compare/branch costs.<br class=""><br class="">One of possible solutions is to add " && Unrolling == Runtime":<br class=""><blockquote type="cite" class=""> if (Count <= 1 && Unrolling == Runtime) {<br class=""></blockquote><br class=""></div></div></blockquote><div><br class=""></div><div>What do you mean? That code is already under this branch:</div><div><br class=""></div><div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""> <span style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2" class="">if</span> (Unrolling == Partial) {</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">So it would never trigger, if I’m reading this right.</div></div><br class=""><blockquote type="cite" class=""><div class=""><div class="">However I still do not understand why we should avoid unrolling if<br class="">threshold let us unroll a loop?<br class="">For the cases where unroll is unprofitable there should be<br class="">corresponding heuristics.<br class="">What is your case?</div></div></blockquote><br class=""></div><div>You’ve changed the definition of “partial” unrolling from what it did before, which makes me someone nervous in general. Our specific use-case for partial unrolling is that GPUs want to reduce latency, so a big loop with high-latency memory operations in it (too big to fully unroll) should be partially unrolled to trade some number of registers for some amount of latency reduction. However, suppose the following case occurs:</div><div><br class=""></div><div>Trip count: 15</div><div>Max unroll count: 8</div><div><br class=""></div><div>This means we unroll 8 times, then create a fixup loop that runs 7 times afterwards. Now we have the absolute worst of both worlds: our register count has gone up a lot because of the unroll, but we still have a lot of latency because of the fixup loop, so we’ll probably end up losing performance overall.</div><div><br class=""></div><div>—escha</div></body></html>