<div dir="ltr">Dear community,<div><br></div><div>Our team at IITH have been experimenting with loop-distribution pass in LLVM. We see the following results on few benchmarks.</div><div><br></div><div><span id="gmail-docs-internal-guid-e7e6be9a-aa54-3006-2059-effa9157d485"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-indent:36pt"><span style="font-size:13.3333px;font-family:"courier new";color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">clang -O3 -mllvm -enable-loop-distribute -Rpass=loop-distribute file.c</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-indent:36pt"><span style="font-size:13.3333px;font-family:"courier new";color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="font-size:13.3333px;line-height:18.4px">clang -O3 -mllvm -enable-loop-distribute -Rpass-analysis=loop-distribute file.c</span><br></span></p><br><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-indent:36pt"><font face="monospace, monospace"><a href="http://crd.lbl.gov/departments/computer-science/PAR/research/previous-projects/torch-testbed/" style="text-decoration:none"><span style="font-size:16px;font-weight:700;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">TORCH</span></a><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">:</span></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace">There are nearly 488 loops in this benchmark. LLVM was not able to distribute any loop.</font></span></p><font face="monospace, monospace"><br></font><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-indent:36pt"><font face="monospace, monospace"><a href="https://github.com/shantanuatiith/TSVC_" style="text-decoration:none"><span style="font-size:16px;font-weight:700;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">TSVC</span></a><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">:</span></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><font face="monospace, monospace"><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">There are 151 loops coded in plain ‘C’, </span><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">none</span><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"> of them got distributed. TSVC has particularly candidates valid for distribution like the one below. The inner loop in this example can be distributed.</span></font></p><font face="monospace, monospace"><br></font><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:36pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">for (int nl = 0; nl < ntimes/2; nl++) {</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">for (int i = 1; i < LEN; i++) {</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:108pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">a[i] += c[i] * d[i];</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:108pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">b[i] = b[i - 1] + a[i] + d[i];</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">}</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444">dummy(a, b, c, d, e, aa, bb, cc, 0.);</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:36pt;text-indent:36pt"><span style="font-size:13.3333px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace" color="#444444"> }</font></span></p><font face="monospace, monospace"><br></font><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><font face="monospace, monospace"><span style="font-size:16px;vertical-align:baseline;background-color:transparent">     <a href="http://vhosts.eecs.umich.edu/mibench//" style="font-weight:700;white-space:pre-wrap;text-decoration:none">MiBench</a></span><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">:</span></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><font face="monospace, monospace">There are 6539 loops in MiBench. <b>None</b> of the loops were distributed by the loop-distribution pass of LLVM.</font></span></p><font face="monospace, monospace"><br></font><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><font face="monospace, monospace"><span style="font-size:16px;vertical-align:baseline;background-color:transparent">     <a href="https://github.com/exmatex/CoMD" style="font-weight:700;white-space:pre-wrap;text-decoration:none">CoMD</a></span><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">:</span></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap"><font face="monospace, monospace">CoMD is a reference implementation of typical classical molecular dynamics algorithms and workloads.</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><font face="monospace, monospace"><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Out of 112 loops </span><span style="font-size:16px;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">none</span><span style="font-size:16px;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap"> of them are distributed.</span></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><span style="font-size:16px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap"><br></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;margin-left:72pt"><span style="font-size:16px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap"><br></span></p>For the specific loop I have shown above which is straight forward a good distribution candidate, the remark is "<b>loop not distributed: memory operations are safe for vectorization [-Rpass-analysis=loop-distribute]</b>". </span></div><div><span><br></span></div><div><span>Can someone reason these results and this remark?</span></div><div><span><br><br>I humbly request the community to correct me, if am missing something in my analysis.</span><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">Thank you<div>D Tharun kumar</div><div>CS15MTECH11002</div><div>9948373970</div><div>CSE-IITH</div></div></div></div></div>
</div></div>