<div dir="ltr">I generally have no idea of the tradeoffs in terms of non-PRE/non-GVN/etc optimizations.<div>To whit: </div><div>GVN will not do anything with selects on it's own.</div><div>It does not, on it's own, simplify them</div><div>PRE the same.</div><div>GVN tends to end up simplifying them simply by virtue of equality propagation/etc replacing the use in the select. But it's not guaranteed.<br></div><div><br></div><div>I can state definitively that these kinds of optimizations would generally benefit from a non-select representation.</div><div><br></div><div><br></div><div>One could teach opts like GVN/PRE to view selects as fake little CFG's, and produce a fake CFG that expands the select and have these opts use that instead.</div><div>No idea what kind of overhead this would incur, and they'd have to know it was fake, because they have to know how to make it back into a select :)</div><div><br></div><div><br></div><div><div>On the other hand, I expect vectorization/et al would do better with selects.</div></div><div><br></div><div>You could teach these to basically treat simple SESE regions as instructions. The downside is they also have to transform them back into selects, which means they have to have the logic to kick all the non-select instructions back above/below the region :)</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 25, 2015 at 3:45 PM, Geoff Berry <span dir="ltr"><<a href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Not to hijack this discussion, but this is related to something I’ve been looking at (and have discussed with Hal and James previously), which is select vs. phi canonicalization.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Currently SimplifyCFG does most of the converting of phis to selects in IR.  There is also early and late if-conversion in some backends, but let’s ignore that for now.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">CodeGenPrepare does the reverse (i.e. select -> phi/branch) for some targets and some selects.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">SimplifyCFG uses a target instruction cost model of the speculated instructions to decide whether to do phi -> select conversion.  It currently uses a cutoff of 2*Basic if I recall correctly.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">I’d like to start a discussion on what the “right” canonical representation of these operations is over the phases of optimization.  It seems to me that keeping branches around longer would be better since it would allow global optimizations to only have to worry about phis.  It would also create blocks where instructions could be sunk.  For example:<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p> int selects(int a, int b, int c, int d) {<u></u><u></u></p><p>     int x1, x2, x3;<u></u><u></u></p><p> <u></u><u></u></p><p>     x1 = a / b;<u></u><u></u></p><p>     x2 = b / c;<u></u><u></u></p><p>     x3 = c / d;<u></u><u></u></p><p>     if (a < b) {<u></u><u></u></p><p>         x1 = 0;<u></u><u></u></p><p>         x2 = 0;<u></u><u></u></p><p>         x3 = 0;<u></u><u></u></p><p>     }<u></u><u></u></p><p> <u></u><u></u></p><p>     return x1 + x2 + x3;<u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">If we don’t convert the above to selects, then the CodeSinking pass is able to sink the divides into an else block that it creates.  (As an aside this brings up the question of canonical location of instructions w.r.t. sinking/forwarding, which I haven’t seen discussed either).<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">I believe the downside to keeping phis/branches in the IR (someone please correct me if I’m wrong here), is that we break up the scope of later non-global passes.  I don’t have a feel for how big of a problem this is or how hard it would be to fix by e.g. extending these passes to work on single-entry single-exit regions or even superblocks.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Any thoughts on this?<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><div style="border:dashed #2f6fab 1.0pt;padding:12.0pt 12.0pt 12.0pt 12.0pt;background:#f9f9f9"><p class="MsoNormal" style="line-height:15.6pt;background:#f9f9f9;border:none;padding:0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">--<u></u><u></u></span></p><p class="MsoNormal" style="line-height:15.6pt;background:#f9f9f9;border:none;padding:0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Geoff Berry<u></u><u></u></span></p><p class="MsoNormal" style="line-height:15.6pt;background:#f9f9f9;border:none;padding:0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Employee of Qualcomm Innovation Center, Inc.<u></u><u></u></span></p><p class="MsoNormal" style="line-height:15.6pt;background:#f9f9f9;border:none;padding:0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project<u></u><u></u></span></p></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Daniel Berlin [mailto:<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>] <br><b>Sent:</b> Wednesday, March 25, 2015 10:25 AM<br><b>To:</b> <a href="mailto:reviews%2BD8120%2Bpublic%2Bd2c9388bc8837c4c@reviews.llvm.org" target="_blank">reviews+D8120+public+d2c9388bc8837c4c@reviews.llvm.org</a><br><b>Cc:</b> <a href="mailto:tilmann.scheller@googlemail.com" target="_blank">tilmann.scheller@googlemail.com</a>; Nick Lewycky; David Majnemer; Hal Finkel; Philip Reames; James Molloy; <a href="mailto:mssimpso@codeaurora.org" target="_blank">mssimpso@codeaurora.org</a>; <a href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>; Commit Messages and Patches for LLVM<br><b>Subject:</b> Re: [PATCH] [GVN] Eliminate redundant loads whose addresses are dependent on the result of a select instruction.<u></u><u></u></span></p><div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">Out of curiosity, at what phase do you see these redundancies?<u></u><u></u></p><div><p class="MsoNormal">GCC used to canonicalize on the CFG version and do select formation late specifically to address some of these issues.<u></u><u></u></p></div></div><div><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">On Wed, Mar 25, 2015 at 7:20 AM, Chad Rosier <<a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>> wrote:<u></u><u></u></p><blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt"><p class="MsoNormal">I've abandoned this patch (originally abandoned on March 16th).  Overall, I saw no significant performance improvements across SPEC2K/SPEC2K6 and after further testing there was one significant regression (i.e., 6% on spec2000/vpr), which was the workload I was targeting.  While the patch does remove the redundant load, that redundancy was replaced by another set of redundant instructions, fcmp/csel.  IIRC, csel instructions are bad for A57 devices in general.<br><br>I agree that a more general solution should be considered and if the new PRE pass can enable that solution, I'm in no rush to push this patch forward.<br><br>Regardless, I do appreciate everyones feedback!<u></u><u></u></p><div><div><p class="MsoNormal" style="margin-bottom:12.0pt"><br><br><a href="http://reviews.llvm.org/D8120" target="_blank">http://reviews.llvm.org/D8120</a><br><br>EMAIL PREFERENCES<br>  <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br><br><u></u><u></u></p></div></div></blockquote></div><p class="MsoNormal"><u></u> <u></u></p></div></div></div></div></div></blockquote></div><br></div>