<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Hi <span style="font-family: arial, helvetica, sans-serif; font-size: small;">Jiangning,</span></div><br><div><div>On Apr 14, 2014, at 10:31 PM, Jiangning Liu <<a href="mailto:liujiangning1@gmail.com">liujiangning1@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi Jim,</div><div class="gmail_extra"><br><div class="gmail_quote">2014-04-15 4:28 GMT+08:00 Jim Grosbach <span dir="ltr"><<a href="mailto:grosbach@apple.com" target="_blank">grosbach@apple.com</a>></span>:<br>

<blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">This sounds reasonable. Thanks, all.<br>

<div class=""><br>

> - CSE of ADRP optimization (Jiangning)<br>

<br>

</div>Quentin may have some input here. He’s done quite a lot of optimizations for ADRP sequences.<br>

<span class=""><font color="#888888"><br>

-Jim<br>

</font></span><div class=""><div class="h5"></div></div></blockquote></div><br>

</div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks for letting me know Quentin may have deep thought around this.</div></div><div class="gmail_extra">

<br></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">ARM64 generates pseudo instructions ARM64::MOVaddr and friends in ISEL stage, which intends to guarantee address serialization (page address + in-page address), and exposes adrp finally by pass ExpandPseudoInsts. The assumption of ARM64 solution is we don't know the in-page offset can be fused into load/store or not at compile time, and this assumption would turn to be not true any longer for the solution of using global merge as I proposed with the patch.</div></div></div></blockquote>I think this is orthogonal. If you happen to merge globals they will have the same base address (i.e., the same pseudo instruction) but different offsets.</div><div>CSE and such will work like a charm for the pseudos.</div><div><br></div><div>Assuming you emit the right instructions at isel time, you will create ADRP, LOADGot, or ADD with symbols. Since you do not know anything on the symbols, CSE will match only the ones that are identical.</div><div>You will have a finer granularity to do CSE, but I am not sure it will help that much.</div><div>On the other hand, you lose the rematerialization capability, because that feature can only handle one instruction at a time. So you will still be able to rematerialize ADRP but not the LOADGot and ADD with symbols.</div><div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra">

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If simply apply the global merge solution to ARM64, probably we should avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but I'm not sure if the LOH solution would still work or not, because,</div>

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">1) ARM64 link-time optimization depends on LOH.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

2) We don't see linker plug-in in LLVM trunk and it would be hard for me to verify any thoughts.</div></div></div></blockquote><div>The LOH solution is also orthogonal. You can see that as a last chance way to optimize those accesses.</div><div>That said, if you CSE the ADRP and not the LOADGot, you will indeed create far less candidates for the LOHs because you will have ADRPs with several uses, which is not supported by LOHs.</div><div><br></div><div>FYI, the LOH optimization is not a link-time optimization in LLVM, this is really a link-time optimization: on the binary.</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

Any concrete suggestion of combining those different ADRP CSE solutions and tests would be appreciated! </div></div></div></blockquote><div>The bottom line is whatever you are doing with merge globals, it is orthogonal with LOHs.</div><div>That said I think it is best to keep the pseudo instructions.</div><div><br></div><div>Of course I may be wrong and the best way to check would be to measure what happens if you get rid of the pseudo instructions. Do not be too concerned with the impact on the LOHs.</div><div><br></div><div>Thanks,</div><div>-Quentin</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">-Jiangning</div><br></div></div>

_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></body></html>