<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Hi Jiangning,</div><div><br></div>On Apr 17, 2014, at 7:56 PM, Jiangning Liu <<a href="mailto:liujiangning1@gmail.com">liujiangning1@gmail.com</a>> wrote:<br><div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Hi Quentin,<div><br></div><div>Thanks for your kindly help!</div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">
<div style="word-wrap:break-word"><div class=""><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>The problem is MOVaddr generated for ARM64 implies introducing adrp in ExpandPseudoInsts pass again, although at this moment we don't really see redundant ADRP yet. AArch64 is using ADDxxi_lsl0_s instead, and it will be folded into LS32_STR finally.</div>
</div></div></div></blockquote></div><div><div>Interesting.</div>Looks like we are too clever here.</div><div>I would have expected ISel to generate one base address and one displacement.</div><div><br></div><div>I believe that if we fix that both the LOHs and the global merge become orthogonal. My guess is that we should be less aggressive at folding offset if there are several uses.</div>
<div class=""><br></div></div></blockquote><div><br></div><div>Sounds great! I thought I had misunderstanding. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div class=""><div></div></div></div><div><div class=""><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">
<div style="word-wrap:break-word"><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div style="font-family:arial,helvetica,sans-serif;font-size:small">If simply apply the global merge solution to ARM64, probably we should avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but I'm not sure if the LOH solution would still work or not, because,</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small">1) ARM64 link-time optimization depends on LOH.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small">
2) We don't see linker plug-in in LLVM trunk and it would be hard for me to verify any thoughts.</div></div></div></blockquote></div><div>The LOH solution is also orthogonal. You can see that as a last chance way to optimize those accesses.</div>
<div>That said, if you CSE the ADRP and not the LOADGot, you will indeed create far less candidates for the LOHs because you will have ADRPs with several uses, which is not supported by LOHs.</div></div></blockquote>
<div><br></div><div>Yes. This is just what I'm worrying about. So essentially those two optimizations have conflict.</div></div></div></div></blockquote></div><div>Let us try to fix the codegen problem while keeping the pseudos.</div>
</div></div></blockquote><div><br></div><div>Currently, we have the following code for ARM64,</div><div><br></div><div><div>// The MOVaddr instruction should match only when the add is not folded</div><div>// into a load or store address.</div>
<div>def MOVaddr</div><div> : Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),</div><div> [(set GPR64:$dst, (ARM64addlow (ARM64adrp tglobaladdr:$hi),</div><div> tglobaladdr:$low))]>,</div>
<div> Sched<[WriteAdrAdr]>;</div></div><div><br></div><div>Does it mean ISEL will generate pseudo MOVaddr as long as pattern "(ARM64addlow (ARM64adrp $hi), $low)" exists? </div></div></div></div></blockquote>Yes.</div><div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>So I think we should remove this pseudo, and I don't understand what do you mean by "keeping the pseudos". Are there any other purposes of introducing pseudo MOVaddr?</div></div></div></div></blockquote><div>This is for rematerialization.</div><div>What I mean by "keeping the pseudos" is we should try to preserve them as much as possible if it does not hurt performances. In your example, I think we could have generated only one pseudo and one displacement but the current lowering assumed it is better to fold one of them into the store.</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;"><div style="word-wrap:break-word"><div><div class=""><blockquote type="cite">
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Since compile-time ADRP CSE is not so powerful as link-time ADRP removal, I don't want to hurt link-time solution.</div></div></div></blockquote>
</div>Well, this is something that should be measured. Your patch does not kill the LOHs, it may just reduce the number of potential candidates. For each candidate that your patch removes, it means we at least spare one ADRP instruction. The trade-off does not seem bad.</div>
<div><br></div><div>I suggest we:</div><div>1. Fix the ISel of pseudo (making the folding less aggressive).</div><div>2. Measure the performance with your patch.</div><div><br></div><div>I can definitely help for the measurements with the LOHs enabled in parallel with your patch.</div>
<div>If you want I can help for #1 too.</div></div></blockquote><div><br></div><div>You are so nice and I'm glad that you can help both, because</div><div>1) I don't have 64-bit hardware yet</div><div>2) I don't have the link plug-in either</div>
<div>3) I will be busy at another high priority bug fix in a week</div></div></div></div></blockquote>I’ll try to make the measurement with and without your patch next week.</div><div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">Side question, did you happen to measure any performance improvement/regression with your patch?</div></blockquote><div><br></div><div>I don't have hardware yet, so I myself didn't, but Ana helped me to measure it on A53 previously, but the data doesn't show consistency for two separate measurements, so I was not convinced by that data yet. But the data does show some sporadic improvements for some tests in EEMBC.</div>
</div></div></div></blockquote><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>I’d like to know which tests would be good candidates to measure the impact of your patch + LOHs enabled.</div>
</div></blockquote><div><br></div><div>With my patch only, I expect 256.bzip2 and 252.eon have some performance change because they have 42% and 52% adrp reduction percentage respectively.</div></div></div></div></blockquote>I don’t have access to EEMBC myself so I’ll focus on SPEC.<br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"></div></div></div></blockquote><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>For LOH, I think linker can cover a lot of more cases like the global variable is not defined in the file being compiled. I need to collect more data around LOH, and do you have any idea how to measure LOH effect statically? Counting the number of LOH is enough?</div></div></div></div></blockquote>The linker is quite good at optimizing the LOHs. I’d say for most 3-instruction long LOHs, it turns at least one instruction into a nop.</div><div>That said, no, you cannot statically measure the effect of LOHs.</div><div><br></div><div>Thanks,</div><div>-Quentin</div><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">
<div><br></div><div>Thanks,</div><div>-Jiangning</div><div> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">
<div style="word-wrap:break-word"><div><br></div><div>Thanks,</div><div>-Quentin</div><div class=""><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">
<div style="word-wrap:break-word"><div><div><br></div><div>Thanks,</div><div>-Quentin</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div style="font-family:arial,helvetica,sans-serif;font-size:small">
<br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">
Thanks,</div><div style="font-family:arial,helvetica,sans-serif;font-size:small">-Jiangning</div><br></div></div><div>
_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></div></blockquote></div><br></div></blockquote></div><br></div></div>
</blockquote></div><br></div></div></blockquote></div><br></div></div>
</blockquote></div><br></body></html>