<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi,</div><div class=""><br class=""></div><div class="">We'd like to propose new Loop Distribution pass. The main motivation is to</div><div class="">allow partial vectorization of loops. One such example is the main loop of</div><div class="">456.hmmer in SpecINT_2006. The current version of the patch improves hmmer by</div><div class="">24% on ARM64 and 18% on X86.</div><div class=""><br class=""></div><div class="">The goal of the pass is to distribute a loop that can't be vectorized because of</div><div class="">memory dependence cycles. The pass splits the part with cycles into a new loop</div><div class="">making the remainder of the loop a candidate for vectorization. E.g.:</div><div class=""><br class=""></div><div class=""> for (k = 0; k < M; k++) {</div><div class=""> S1: MC[k+1] = …</div><div class=""><div class=""> // Cycle in S2 due to DC[k+1] -> DC[k] loop-carried dependence</div></div><div class=""> S2: DC[k+1] = DC[k], MC[k] </div><div class=""> }</div><div class=""> </div><div class=""> => (Loop Distribute)</div><div class=""> </div><div class=""> for (k = 0; k < M; k++) {</div><div class=""> S1: MC[k+1] = ...</div><div class=""> }</div><div class=""> for (k = 0; k < M; k++) {</div><div class=""> S2: DC[k+1] = DC[k], MC[k]</div><div class=""> }</div><div class=""> </div><div class=""> => (Loop Vectorize S1)</div><div class=""> </div><div class=""> for (k = 0; k < M; k += 4) {</div><div class=""> S1: MC[k+1:k+5] = ...</div><div class=""> }</div><div class=""> for (k = 0; k < M; k++) {</div><div class=""> S2: DC[k+1] = DC[k], MC[k]</div><div class=""> }</div><div class=""><br class=""></div><div class="">I'd like to collect feedback on the design decisions made so far. These are</div><div class="">implemented in a proof-of-concept patch (<a href="http://reviews.llvm.org/D6930" class="">http://reviews.llvm.org/D6930</a>).</div><div class="">Here is the list of design choices:</div><div class=""><br class=""></div><div class="">- Loop Distribution is implemented as a separate pass to be run before the Loop</div><div class=""> Vectorizer.</div><div class=""><br class=""></div><div class="">- The pass reuses the Memory Dependence Checker framework from the Loop</div><div class=""> Vectorizer. This along with the AccessAnalysis class is split out into a new</div><div class=""> LoopAccessAnalysis class. We may want to turn this into an analysis pass on its own.</div><div class=""><br class=""></div><div class="">- It also reuses the Run-time Memory Check code from the Loop Vectorizer. The</div><div class=""> hmmer loop requires memchecks. This is again captured by the same</div><div class=""> LoopAccessAnalysis class.</div><div class=""><br class=""></div><div class="">- The actual loop distribution is implemented as follows:</div><div class=""><br class=""></div><div class=""> - The list of unsafe memory dependencies is computed for the loop. Unsafe</div><div class=""> means that the dependence may be part of a cycle (this is what the current</div><div class=""> framework provides).</div><div class=""> - Partitions are created for each set of unsafe dependences.</div><div class=""> - Partitions are created for each of the remaining stores not yet encountered.</div><div class=""> The order of the partitions preserve the original order of the dependent</div><div class=""> memory accesses.</div><div class=""> - Simple partition merging is performed to minimize the number of new loops.</div><div class=""> - Partitions are populated with the other dependent instructions by following</div><div class=""> the SSA use-def chains and control dependence edges.</div><div class=""> - Finally, the actual distribution is performed by creating a loop for each</div><div class=""> partition. For each partition we clone the loop and remove all the</div><div class=""> instructions that don't belong to the partition.</div><div class=""> - Also, if run-time memory checks are necessary, these are emitted. We keep</div><div class=""> an original version of the loop around to branch too if the checks fail.</div><div class=""><br class=""></div><div class="">My plan is to proceed with the following steps:</div><div class=""><br class=""></div><div class="">- Bring the current functionality to trunk by splitting off smaller patches from</div><div class=""> the current patch and completing them. The final commit will enable loop</div><div class=""> distribution with a command-line flag or a loop hint.</div><div class=""><br class=""></div><div class="">- Explore and fine-tune the proper cost model for loop distribution to allow</div><div class=""> partial vectorization. This is essentially whether to partition and what</div><div class=""> these partitions should be. Currently instructions are mapped to partitions</div><div class=""> using a simple heuristics to create a vectorizable partitions. We may need to</div><div class=""> interact with the vectorizer to make sure the vectorization will actually</div><div class=""> happen and it will be overall profitable.</div><div class=""><br class=""></div><div class="">- Explore other potentials for loop distribution, e.g.:</div><div class=""> - Partial vectorization of loops that can't be if-converted</div><div class=""> - Classic loop distribution to improve spatial locality</div><div class=""> - Compute the Program Dependence Graph rather than the list of unsafe memory</div><div class=""> accesses and allow reordering of memory operations</div><div class=""> - Distribute a loop in order to recognize parts as loop idioms</div><div class=""><br class=""></div><div class=""> Long term, loop distribution could also become a transformation utility</div><div class=""> (Transform/Util). That way, the loop transformation passes could use it to</div><div class=""> strip the loop from parts that inhibits the given optimization.</div><div class=""><br class=""></div><div class="">Please let me know if you have feedback either on the design or on the next</div><div class="">steps.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Adam</div><div class=""><br class=""></div></body></html>