<div class="gmail_extra"><div class="gmail_quote">On Wed, Apr 25, 2012 at 10:44 PM, Evan Cheng <span dir="ltr"><<a href="mailto:evan.cheng@apple.com" target="_blank">evan.cheng@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF"><div class="im"><div><br><br>On Apr 25, 2012, at 7:37 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:<br><br></div><div></div><blockquote type="cite">

<div><div class="gmail_extra"><div class="gmail_quote">On Wed, Apr 25, 2012 at 7:07 PM, Lang Hames <span dir="ltr"><<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="gmail_extra"><div class="gmail_quote"><div>Hi Bob, Evan, </div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">

<div><div>

<div><blockquote type="cite"><div style="word-wrap:break-word"><div>I vaguely remember this. Do you remember why the multiply isn't moved together with the sext / zext?</div></div></blockquote><br></div></div></div><div>


The change was svn 128502.  The commit message doesn't have many details.  It references <a>rdar://8832507</a> and <a>rdar://9203134</a>.  I took a quick look at 8832507, where I commented that the zext/sext was getting moved by LICM.  Presumably only one of the operands was loop-invariant, so the multiply would remain in the loop.</div>


</div></blockquote><div><br></div></div><div class="gmail_extra">That makes sense. Thanks for the pointers to the commits/radars too.</div><div><br></div><div>I talked this over with Dan Gohman this morning and we came to the conclusion that the best way of handling this is probably to add a specialized simplify for widening mul intrinsics. It should only simplify when both operands are constant (result is constant), or either constant is zero or one (result is a sext/zext). This was one of Chandler's suggested solutions too. I'll work up a patch soonish.</div>


</div></div></blockquote><div><br></div><div>While I like this solution, and particularly for widening multiplies I can see reasons to specifically prefer it, I'd like to point out one alternative to the generic problem this thread has touched on: an intrinsic that expands to multiple IR constructs which we would like to match back to exactly one instruction.</div>


<div><br></div><div>I understand the difficulty of looking across BB edges and other CFG elements, seeing through CSE and other foldings which can impact this. However, we have great tools to do all of these things, especially in the IR. The whole point of lowering to generic IR constructs is to get access to these tools.</div>


<div><br></div><div>My idea for how to handle these patterns is to have target-specific ISD nodes to represent their special semantics, and to use target-specific combines to reach across CFG and other constructs to form these ISD nodes even after optimizations. At that layer we can also avoid forming the nodes when the optimizations that have perturbed the target-independent code have actually made sufficiently significant optimizations to be superior in code quality to the instruction the intrinsic would naively have lowered to.</div>


<div><br></div></div></div></div></blockquote><div><br></div></div>I am not sure how this would work. Currently dag combine operates on a BB at a time. Or are you thinking about the selection dag builder time optimization that we added to deal specific with sext / zext. That's not a general solution though. One day we will implement whole function isel, and this approach should work well then. </div>

</blockquote><div><br></div><div>Yea, Owen clarified the BB-at-a-time nature of this to me in IRC.</div><div><br></div><div>What I think would still work is to add target-specific hooks to the selection dag builder, so that it can form target specific nodes directly from IR. It's a bit gross, but it offers a lot of selective flexibility.</div>

<div><br></div><div>I completely agree that whole-function isel is a more general solution to this problem. If we can delay committing to very much in either direction until that's available, excellent. If there is a pressing need (and I don't think the arm intrinsics that started this are such, i completely agree with the plan to do specific combines there), I would prefer adding target-hooks to the dag builder to keep the IR representation and middle end optimizations clean and ignorent of these details...</div>

<div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF"><div>It also depends on what kind the programs do you optimize for. Some expert programmers would take you that they expect strict one to one translation from intrinsics to instructions since compiler will never match hand crafted assembly.</div>

</div></blockquote><div><br></div><div>Yea... My fundamental philosophy is that they should use hand crafted assembly. =] Inline asm (or even better, an assembly file) seem like the right tool for the job here.</div><div>

<br></div><div>But indeed, this is an age-old debate, which I suspect no one will ever win....</div></div></div>