<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 24, 2017, at 3:08 PM, Amaury SECHET via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">2017-01-24 13:47 GMT+01:00 Nemanja Ivanovic <span dir="ltr" class=""><<a href="mailto:nemanja.i.ibm@gmail.com" target="_blank" class="">nemanja.i.ibm@gmail.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="">I may be wrong here, but legalizing early seems like something that is more likely to prevent optimizations than it is to encourage them.<br class=""><br class=""></div></div></div></blockquote><div class=""><br class=""></div><div class="">I guess it really depends. My gut feeling is that you are right for simple things like using proper integer sizes, but that it is probably not true for anything involving select/control flow.<br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class=""><br class=""></div>But I guess I don't follow why things like TTI, TII and TLI queries don't suffice for this. CodeGenPrepare will break this sequence up. I would imagine that if the target returns false for isCheapToSpeculateCtlz() and false for canInsertSelect(), the code would look the way you'd like it to.<br class=""><br class=""></div>But as I said, I'm mostly speculating here and I might be very wrong.<br class=""></div><div class="gmail_extra"><br class=""></div></blockquote><div class=""><br class=""></div><div class="">I got a fair amount of bad codegen here because a branch is added at the last minute but at this point, all the passes doing anything interesting with the control flow are over. For instance:<br class=""><br class=""></div><div class="">auto x = ctlz(n);<br class=""></div><div class="">auto y = ((x * 36) + 35) >> 8;<br class=""></div><div class=""><br class=""></div><div class="">It's fairly obvious that you'd like to constant fold y in the case a branch is required. That is something that nothing after CodeGenPrepare knows how to do.<br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>It seems to me that this is something that could be done at the MI level (I’d need to see the MI dump though).</div><div><br class=""></div><div>GobalISel may also help here by being able to look beyond single blocks (CC Tim/Quentin, see below for the example with the branch).</div><div><br class=""></div><div>— </div><div>Mehdi</div><div><br class=""></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><div class=""><div class="gmail-h5">On Mon, Jan 23, 2017 at 5:02 PM, via llvm-dev <span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span> wrote:<br class=""></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="gmail-h5"><div class="gmail-m_-318660157652936051HOEnZb"><div class="gmail-m_-318660157652936051h5"><br class="">
> On Jan 23, 2017, at 4:06 AM, Amaury SECHET via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class="">
><br class="">
> Hi all,<br class="">
><br class="">
> Some non trivial legalization of operations which aren't supported by the backend would benefit from having the optimizer pass on them. I noticed some example trying to optimize various pieces of code over the past weeks.<br class="">
><br class="">
> One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT instructions that are properly defined for 0. The backend insert code with a branch that checks for 0 and use bsf/bsr or just use a constant.<br class="">
><br class="">
> But if we are to branch anyway, and one path of the branch set the value as a constant, there are some obvious optimization which can be done, starting with constant folding. None of these happen in the backend and it doesn't seems to be the right place anyway. See for instance the sample code from a serialization/deserialization routine (the code has been tuned to illustrate the problem in a brief way) :<br class="">
><br class="">
> auto a = ctlz(n, false);<br class="">
> auto b = ((a * 36) + 35) >> 8;<br class="">
><br class="">
> Which will be synthesized as follow:<br class="">
><br class="">
> auto a = (n == 0) ? 64 : ctlz(n, true);<br class="">
> auto b = ((a * 36) + 35) >> 8;<br class="">
><br class="">
> But obviously, recomputing b in the case n is 0 is completely pointless work. A better codegen would be something like:<br class="">
><br class="">
> if (n == 0) {<br class="">
> a = 64;<br class="">
> b = 0;<br class="">
> } else {<br class="">
> a = ctlz(n, true);<br class="">
> b = ((a * 36) + 35) >> 8;<br class="">
> }<br class="">
><br class="">
> The optimizer knows how to do these kind of transformations, but the backend do not. I encountered the same issue a some time back in a memory allocator, and worked around it, but as I'm encountering it again in the serialization library, I'm assuming there may be some untapped source of optimizations here.<br class="">
><br class="">
> I was unsure about where these optimizations should take place. Clearly, we want to do them early in the pipeline so that other passes can pick up on it. I was looking around but it didn't seemed like there was a good place to add this transformation.<br class="">
><br class="">
> Other examples of legalization that may benefit from the optimizer are splitting of large integral that the backend do not support into multiple operations on smaller integrals.<br class="">
><br class="">
> Would a EarlyLegalization pass be worth it ? It could use infos from the backend and do various transformations that the backend would have to do anyway, which will expose optimization opportunities. Or is there a place that is appropriate to insert theses ?<br class="">
<br class="">
</div></div>At least in theory, SelectionDAG is supposed to be able to do a lot of these kind of optimizations. The goal is to do legalization, then clean up the results of that in combine2.<br class="">
<br class="">
—escha<br class=""></div></div>
______________________________<wbr class="">_________________<br class="">
LLVM Developers mailing list<br class="">
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class="">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">
</blockquote></div><br class=""></div>
</blockquote></div><br class=""></div></div>
_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></body></html>