<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks River for the further explanations!<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 8:31 PM, River Riddle <<a href="mailto:riddleriver@gmail.com" class="">riddleriver@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hey Quentin,<div class=""> To clarify some of your concerns/questions:</div><div class="">- Currently parameter/output passing are based around the simple cost=1, though this is something that will need to change a bit, as you have pointed out. As I said, the cost model isn't perfect.</div><div class="">- The output that is turned into a return is either: one that will remove an output parameter from the function call, or the last output in the instruction sequence.</div><div class="">- As for being conservative with the cost, we prefer to overestimate the cost of outlining(New function cost) and under estimate the cost of the instruction sequence when possible. This means that we tend to under estimate the benefit, given that we are working with approximate costs.</div><div class="">- When I speak of regressions I am referring to the size of the text section.</div><div class="">- The gist of identifying semantic equivalence is the very first sentence of Candidate Selection in the initial RFC. We identify equivalence generally via Type,Opcode, operand types, any special state. This is just the current definition, we can relax this even further if we want to.</div></div></div></blockquote><div><br class=""></div><div>The thing is with that definition, I fail to see how we can differentiate say add x,x from add x,y. How does that work?</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">- Working at the IR level has more target independent advantages than just reducing the amount of new hooks. This can be seen in the fact that: we don't require mno-red-zone or any other abi flag/check, we can easily parameterize sequences, generate outputs from sequences, etc. </div><div class="">- By combining work I mean that the ability for adding advancements are much easier. if someone wants to add functionality for outlining regions they don't have to worry about any target specific patchups. It just works, for every target. </div><div class="">Both outliners require hooks to get better performance but with IR outliner these hooks are limited to the cost model and not the outlining process itself.</div><div class="">Even though we disagree, I do appreciate the discussion and feedback!</div><div class=""> Thanks,</div><div class="">River Riddle</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 6:24 PM, Quentin Colombet <span dir="ltr" class=""><<a href="mailto:qcolombet@apple.com" target="_blank" class="">qcolombet@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi River,<div class=""><br class=""></div><div class="">Thanks for the detailed explanation.</div><div class=""><br class=""></div><div class="">If people are okay for you to move forward, like I said to Andrey, I won’t oppose. I feel sad we have to split our effort on outlining technology, but I certainly don’t pretend to know what is best!</div><div class="">The bottom line is if people are happy with that going in, the conversation on the details can continue in parallel.</div><div class=""><br class=""></div><div class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 4:56 PM, River Riddle <<a href="mailto:riddleriver@gmail.com" target="_blank" class="">riddleriver@gmail.com</a>> wrote:</div><br class="m_-4066969377426572885Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hey Quentin,<div class=""> Sorry I missed the last question. Currently it doesn't do continual outlining, but it's definitely a possibility.<br class=""></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Ok, that means we probably won’t have the very bad runtime problems I had in mind with adding a lot of indirections. </div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">Thanks,</div><div class="">River Riddle</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 4:36 PM, River Riddle <span dir="ltr" class=""><<a href="mailto:riddleriver@gmail.com" target="_blank" class="">riddleriver@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Hi Quentin,<div class=""> I understand your points and I believe that some meaning is being lost via email. For performance it's true that that cost isn't necessarily modeled, there is currently only support for using profile data to avoid mitigate that. I was working under the assumption, possibly incorrectly, that at Oz we favor small code over anything else including runtime performance. This is why we only run at Os if we have profile data, and even then we are very conservative about what we outline from. </div><div class=""> I also understand that some target hooks may be required for certain things, it happens for other IR level passes as well. I just want to minimize that behavior as much as I can, though thats a personal choice.</div><div class="">As for a motivating reason to push for this, it actually solves some of the problems that you brought up in the post for the original Machine Outliner RFC. If I can quote you:</div><div class=""><br class=""></div><div class="">"<span style="font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">So basically, better cost model. That's one part of the story.</span></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">The other part is at the LLVM IR level or before register allocation identifying similar code sequence is much harder, at least with a suffix tree like algorithm. Basically the problem is how do we name our instructions such that we can match them.</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">Let me take an example.</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">foo() {</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">/* bunch of code */</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">a = add b, c;</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">d = add e, f; </div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">}</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">bar() {</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">d = add e, g;</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">f = add c, w;</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">}</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">With proper renaming, we can outline both adds in one function. The difficulty is to recognize that they are semantically equivalent to give them the same identifier in the suffix tree. I won’t get into the details but it gets tricky quickly. We were thinking of reusing GVN to have such identifier if we wanted to do the outlining at IR level but solving this problem is hard."</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">This outliner will catch your case, it is actually one of the easiest cases for it to catch. The outliner can recognize semantically equivalent instructions and can be expanded to catch even more.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Interesting, could you explain how you do that?</div><div class="">I didn’t see it in the original post.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">As for the cost model it is quite simple:</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - We identify all of the external inputs into the sequence. For estimating the benefit we constant fold and condense the inputs so that we can get the set of *unique* inputs into the sequence. </div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Ok, those are your parameters. How do you account for the cost of setting up those parameters?</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - We also identify outputs from the sequence, instructions that have external references. We add the cost of storing/loading/passing output parameter to the outlined function.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Ok, those are your return values (or your output parameter). I see the cost computation you do on those, but it still miss the general parameter setup cost.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - We identify one output to promote to a return value for the function. This can end up reducing an output parameter that is passed to our outlined function.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">How do you choose that one?</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - We estimate the cost of the sequence of instructions by mostly using TTI.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Sounds sensible. (Although I am not a fan of the whole TTI thing :)).</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - With the above we can estimate the cost per occurrence as well as the cost for the new function. Of course these are mostly estimates, we lean towards the conservative side to be safe. <br class=""></div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Conservative in what sense? Put differently how do you know your cost is conservative?</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> - Finally we can compute an estimated benefit for the sequence taking into account benefit per occurrence and the estimated cost of the new function.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span>Make sense.</div><div class=""><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">There is definitely room for improvement in the cost model. I do not believe its the best but its shown to be somewhat reliable for most cases. It has benefits and it has regressions, as does the machine outliner.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Regressions in what sense?</div><div class="">Do you actually have functions that are bigger?</div><div class=""><br class=""></div><div class="">To clarify, AFAIR, the machine outliner does not have such regressions per say. The outliner does perfectly what the cost model predicted: functions are individually smaller. Code size grow may happen because of padding in the object file (and getting in the way of some linker optimization).</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> The reasoning for adding the new framework, is because I believe that this transformation should exist at the IR level. Not only because it is the simplest place to put it but also because it provides the greatest opportunities for improvement with the largest coverage. </div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Again I don’t get those arguments. Machine passes can be target independent. Sure they may require the targets to adopt new hooks but that’s about it and I don’t see how this is different from having to add adopt hooks from TTI. I am guessing you can mostly achieve your goals with the existing TTI hooks, but the arguments passing ones, which is the part I missed in your explanation :).</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">We can expand the framework to catch Danny's cases from the machine outliner RFC, we can add region outlining, etc. We can do all of this and make it available to every target automatically. The reason why I am for this being at the IR level is not because I want to split up the technical effort, but combine it.</div></div></blockquote></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I fail to see the combining part here :).</div><div class=""><br class=""></div><div class="">To give you a bit context, I am afraid that, indeed, the implementation is going to be simpler (partly because a lot of people are more familiar with LLVM IR than MachineInstr) and for the most part TTI is going to be good enough. However, when we will want to go the extra mile and do crazy stuff, we will, like with the vectorizer IMHO, hit that wall of TTI abstractions that we are going to work around in various ways. I’d like, if at all possible, to avoid repeating the history here.</div><div class=""><div class="h5"><div class=""><br class=""></div><div class="">Cheers,</div><div class="">-Quentin</div><br class=""><blockquote type="cite" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class="">Thanks,</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""> River Riddle</div><div style="margin:0px;padding:0px;border:0px;font-family:Arial,Helvetica,sans-serif;font-size:13px" class=""><br class=""></div></div><div class="m_-4066969377426572885HOEnZb"><div class="m_-4066969377426572885h5"><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 4:14 PM, Quentin Colombet <span dir="ltr" class=""><<a href="mailto:qcolombet@apple.com" target="_blank" class="">qcolombet@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class="">Hi River,</div><div dir="auto" style="word-wrap:break-word" class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 2:36 PM, River Riddle <<a href="mailto:riddleriver@gmail.com" target="_blank" class="">riddleriver@gmail.com</a>> wrote:</div><br class="m_-4066969377426572885m_5215599914454273868m_7111013166053538566Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Quentin,<div class=""> I appreciate the feedback. When I reference the cost of Target Hooks it's mainly for maintainability and cost on a target author. We want to keep the intrusion into target information minimized. The heuristics used for the outliner are the same used by any other IR level pass seeking target information, i.e TTI for the most part. I can see where you are coming from with "<span style="font-size:12.8px" class="">having heuristics solely focused on code size do not seem realistic", but I don't agree with that statement.</span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">If you only want code size I agree it makes sense, but I believe, even in Oz, we probably don’t want to slow the code by a big factor for a couple bytes. That’s what I wanted to say and what I wanted to point out is that you need to have some kind of model for the performance to avoid those worst cases. Unless we don’t care :).</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size:12.8px" class=""> I think there is a disconnect on heuristics. The only user tunable parameters are the lower bound parameters(to the cost model), the actual analysis(heuristic calculation) is based upon TTI information. </span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I don’t see how you can get around adding more hooks to know how a specific function prototype is going to be lowered (e.g., i64 needs to be split into two registers, fourth and onward parameters need to be pushed on the stack and so on). Those change the code size benefit.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size:12.8px" class=""> When you say "Would still be interesting to see how well this could perform on some exact model (i.e., at the Machine level), IMO." I am slightly confused as to what you mean. I do not intend to try and implement this algorithm at the MIR level given that it exists in Machine Outliner.</span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Of course, I don’t expect you to do that :). What I meant is that the claim that IR offers the better trade off is not based on hard evidences. I actually don’t buy it.</div><div class=""><br class=""></div><div class="">My point was to make sure I understand what you are trying to solve and given you have mentioned the MachineOutliner, why you are not working on improving it instead of suggesting a new framework.</div><div class="">Don’t take me wrong, maybe creating a new framework at the IR level is the right thing to do, but I still didn’t get that from your comments.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size:12.8px" class=""> There are several comparison benchmarks given in the "More detailed performance data" of the original RFC. It includes comparisons to the Machine Outliner when possible(I can't build clang on Linux with Machine Outliner). I welcome any and all discussion on the placement of the outliner in LLVM.<br class=""></span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">My fear with a new framework is that we are going to split the effort for pushing the outliner technology forward and I’d like to avoid that if at all possible.</div><div class=""><br class=""></div><div class="">Now, to be more concrete on your proposal, could you describe the cost model for deciding what to outline? (Really the cost model, not the suffix algo.)</div><div class="">Are outlined functions pushed into the list candidates for further outlining?</div><div class=""><div class="m_-4066969377426572885m_5215599914454273868h5"><div class=""><br class=""></div><div class="">Cheers,</div><div class="">-Quentin</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size:12.8px" class=""> Thanks,</span></div><div class=""><span style="font-size:12.8px" class="">River Riddle</span></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 1:42 PM, Quentin Colombet <span dir="ltr" class=""><<a href="mailto:qcolombet@apple.com" target="_blank" class="">qcolombet@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi River,<div class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 11:55 AM, River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-4066969377426572885m_5215599914454273868m_7111013166053538566m_-3491381026714088952Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Hi Jessica,</div><div class=""> The comparison to the inliner is an interesting one but we think it's important to note the difference in the use of heuristics. The inliner is juggling many different tasks at the same time, execution speed, code size, etc. which can cause the parameters to be very sensitive depending on the benchmark/platform/etc. The outliners heuristics are focused solely on the potential code size savings from outlining, and is thus only sensitive to the current platform. This only creates a problem when we are over estimating the potential cost of a set of instructions for a particular target. The cost model parameters are only minimums: instruction sequence length, estimated benefit, occurrence amount. The heuristics themselves are conservative and based upon all of the target information available at the IR level, the parameters are just setting a lower bound to weed out any outliers. You are correct in that being at the machine level, before or after RA, will give the most accurate heuristics but we feel there's an advantage to being at the IR level. At the IR level we can do so many more things that are either too difficult/complex for the machine level(e.g parameterization/outputs/etc). Not only can we do these things but they are available on all targets immediately, without the need for target hooks. The caution on the use of heuristics is understandable, but there comes a point when trade offs need to be made. We made the trade off for a loss in exact cost modeling to gain flexibility, coverage, and potential for further features. This trade off is the same made for quite a few IR level optimizations, including inlining. As for the worry about code size regressions, so far the results seem to support our hypothesis.</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Would still be interesting to see how well this could perform on some exact model (i.e., at the Machine level), IMO. Target hooks are cheap and choosing an implementation because it is simpler might not be the right long term solution.</div><div class="">At the very least, to know what trade-off we are making, having prototypes with the different approaches sounds sensible.</div><div class="">In particular, all the heuristics about cost for parameter passing (haven’t checked how you did it) sounds already complex enough and would require target hooks. Therefore, I am not seeing a clear win with an IR approach here.</div><div class=""><br class=""></div><div class="">Finally, having heuristics solely focused on code size do not seem realistic. Indeed, I am guessing you have some thresholds to avoid outlining some piece of code too small that would end up adding a whole lot of indirections and I don’t like magic numbers in general :).</div><div class=""><br class=""></div><div class="">To summarize, I wanted to point out that an IR approach is not as a clear win as you describe and would thus deserve more discussion.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">-Quentin</div><br class=""><blockquote type="cite" class=""><div class=""><span class=""><div dir="ltr" class=""><div class=""> Thanks,</div><div class="">River Riddle</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <span dir="ltr" class=""><<a href="mailto:jpaquette@apple.com" target="_blank" class="">jpaquette@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><br class=""><div class="">Hi River,</div><div class=""><br class=""></div><div class="">I’m working on the MachineOutliner pass at the MIR level. Working at the IR level sounds interesting! It also seems like our algorithms are similar. I was thinking of taking the suffix array route with the MachineOutliner in the future. </div><div class=""><br class=""></div><div class="">Anyway, I’d like to ask about this:</div><span class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-4066969377426572885m_5215599914454273868m_7111013166053538566m_-3491381026714088952m_8379714576610871608Apple-interchange-newline"><div class=""><span style="font-family:Arial;font-size:14.666666984558105px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:pre-wrap;word-spacing:0px;float:none;display:inline!important" class="">The downside to having this type of transformation be at the IR level is it means there will be less accuracy in the cost model - we can somewhat accurately model the cost per instruction but we can’t get information on how a window of instructions may lower. This can cause regressions depending on the platform/codebase, therefore to help alleviate this there are several tunable parameters for the cost model.</span><br class="m_-4066969377426572885m_5215599914454273868m_7111013166053538566m_-3491381026714088952m_8379714576610871608Apple-interchange-newline"></div></blockquote><br class=""></div></span><div class="">The inliner is threshold-based and it can be rather unpredictable how it will impact the code size of a program. Do you have any idea as to how heuristics/thresholds/paramete<wbr class="">rs could be tuned to prevent this? In my experience, making good code size decisions with these sorts of passes requires a lot of knowledge about what instructions you’re dealing with exactly. I’ve seen the inliner cause some pretty serious code size regressions in projects due to small changes to the cost model/parameters which cause improvements in other projects. I’m a little worried that an IR-level outliner for code size would be doomed to a similar fate.</div><div class=""><br class=""></div><div class="">Perhaps it would be interesting to run this sort of pass pre-register allocation? This would help pull you away from having to use heuristics, but give you some more opportunities for finding repeated instruction sequences. I’ve thought of doing something like this in the future with the MachineOutliner and seeing how it goes.</div><span class="m_-4066969377426572885m_5215599914454273868m_7111013166053538566m_-3491381026714088952HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class="">- Jessica</div><br class=""></font></span></div></blockquote></div><br class=""></div></span>
______________________________<wbr class="">_________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class=""></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div>
</div></blockquote></div></div></div><br class=""></div></div></blockquote></div><br class=""></div>
</div></div></blockquote></div><br class=""></div>
</blockquote></div></div></div><br class=""></div></div></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></body></html>