<div dir="ltr"><div>Matt,</div><div><br></div><div>Attached are the two assembly code for kernel "search2", one is from amdgpu pro online compiler, one is from llvm-roc-1.6.x. It will be great if you can take a look and see how to improve the llvm one.</div><div>In case you missed:</div><div><div dir="ltr">The target algorithm is lyra2 and the target kernel is "search2" in <a href="https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/phi2.cl" target="_blank">https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/phi2.cl</a>.</div><div>The detail is implemented in <a href="https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/lyra2mdz.cl" target="_blank">https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/lyra2mdz.cl</a></div></div><div><br></div><div>Thanks,</div><div>   Changdao<br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Sep 5, 2018 at 12:26 PM Changdao Dong <<a href="mailto:dongchangdao@gmail.com">dongchangdao@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">The target algorithm is lyra2 and the target kernel is "search2" in <a href="https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/phi2.cl" target="_blank">https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/phi2.cl</a>.</div><div>The detail is implemented in <a href="https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/lyra2mdz.cl" target="_blank">https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/lyra2mdz.cl</a></div><div>If you have time to take a look at the assembly, I can upload them later today.</div><div><br></div><div>Thanks,</div><div>    Changdao<br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Sep 5, 2018 at 11:32 AM Matt Arsenault <<a href="mailto:arsenm2@gmail.com" target="_blank">arsenm2@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><br><div><br><blockquote type="cite"><div>On Sep 5, 2018, at 23:17, Changdao Dong via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_2176984804790788988m_-6880706484919582771Apple-interchange-newline"><div><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br class="m_2176984804790788988m_-6880706484919582771Apple-interchange-newline">Finally I kind of modified llvm to generate assembly that can run on AMDGPU pro drivers. One problem is the performance of the code generated by llvm is about 10% slower than amdgpu's online compiler. Anything I can tune the performance up the performance of llvm?\</div><br class="m_2176984804790788988m_-6880706484919582771Apple-interchange-newline"></div></blockquote></div>This is very dependent on the case you are looking at, so without a specific example or ISA comparison between the compilers there’s no guessing<div><br></div><div>-Matt</div></div></blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="m_2176984804790788988gmail_signature" data-smartmail="gmail_signature">DONG, Changdao<br><br>MP: 1-412-551-2330<br><a href="mailto:dongchangdao@gmail.com" target="_blank">dongchangdao@gmail.com</a><a href="mailto:cddong@cmu.edu" target="_blank"></a><br></div>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">DONG, Changdao<br><br>MP: 1-412-551-2330<br><a href="mailto:dongchangdao@gmail.com" target="_blank">dongchangdao@gmail.com</a><a href="mailto:cddong@cmu.edu" target="_blank"></a><br></div>