<div dir="ltr">Hi Jessica,<div> When we model the cost of an instruction it is largely based upon an estimation by the TTI. It is still an estimation but the results prove to be somewhat close to the exact costs. If you are interested in the results for some known platforms check the "More detailed results" section of the original RFC. It provides comparisons between the two configurations of outlining at the IR level as well as the Machine Outliner. It includes results for X86_64(Linux and Mac) as well as AArch64(The only two platforms that the Machine Outliner supports). The results tend to be quite favorable in regards to the cost model.<br> As for the tunable heuristic(constants) numbers, these are very similar to those that the Machine Outliner might have if it was tunable: Minimum Benefit, Min Occurrences, Min Instruction Length. Those are the only constants that really exist within the cost model.<br> I appreciate all of the feedback and discussion!<br>Thanks,</div><div>River Riddle</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 24, 2017 at 1:59 PM, Jessica Paquette <span dir="ltr"><<a href="mailto:jpaquette@apple.com" target="_blank">jpaquette@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">Hi River,<div><br></div><div>I think outlining at the IR level certainly has a lot of potential. I also think that working at a more generic representation such as IR opens lots of doors for interesting code size improvements; however, I’m still iffy on heuristics in general.</div><div><br></div><div><span class=""><blockquote type="cite"><div dir="ltr"><div>The inliner is juggling many different tasks at the same time, execution speed, code size, etc. which can cause the parameters to be very sensitive depending on the benchmark/platform/etc. </div></div></blockquote></span>Strongly agree here; it’s unclear *what* a heuristic should look like for an inliner.</div><div><br></div><div><span class=""><blockquote type="cite"><div dir="ltr"><div>The outliners heuristics are focused solely on the potential code size savings from outlining, and is thus only sensitive to the current platform. </div></div></blockquote></span>This is where I get a little concerned.</div><div><br></div><div>Firstly, as previously mentioned, we don’t know how an instruction will be lowered. So, if we say that each IR instruction has a cost of “1 somethings”, we haven’t really solved a *code size problem*, but more of a *code reduction problem*. That is, we’ve performed a reduction to the overall structure of the program, but we don’t know that that overall structural reduction will produce an equivalent code size reduction. I personally would be inclined to figure out how far off such a cost model could be, given a known architecture.</div><div><br></div><div>Secondly, suppose we have a cost model like in the inliner, where we have some heuristic constants which are employed by some cost model. If we change the model, then how do we know our heuristic constants are still good? Even if we tune these constants for some specific target to try and make good decisions, if we change the outliner *at all*, then we have to retune the heuristics. Changes in later passes could also introduce the need for retuning.</div><div><br></div><div>Finally, the big issue here is that with code size problems, or at least with outlining, we’re trying to solve something which is inherently tied to the architecture we’re working with. A good outlining decision for x86 will look different from a good outlining decision for ARM64. The issue isn’t purely structural; the actual instructions we’re dealing with matter.</div><div><br></div><div>Once again, I think this stuff is really interesting, and I think your results are looking great so far. It’s really awesome to see this method being successful at a high level. I’m also making a lot of assumptions about what your cost model might look like. I’ll probably have to look at the code to make a sound judgement. :)</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>- Jessica</div></font></span><div><div class="h5"><div><div><div><div><br><blockquote type="cite"><div>On Jul 24, 2017, at 11:55 AM, River Riddle <<a href="mailto:riddleriver@gmail.com" target="_blank">riddleriver@gmail.com</a>> wrote:</div><br class="m_5452462766631507231Apple-interchange-newline"><div><div dir="ltr"><div>Hi Jessica,</div><div> The comparison to the inliner is an interesting one but we think it's important to note the difference in the use of heuristics. The inliner is juggling many different tasks at the same time, execution speed, code size, etc. which can cause the parameters to be very sensitive depending on the benchmark/platform/etc. The outliners heuristics are focused solely on the potential code size savings from outlining, and is thus only sensitive to the current platform. This only creates a problem when we are over estimating the potential cost of a set of instructions for a particular target. The cost model parameters are only minimums: instruction sequence length, estimated benefit, occurrence amount. The heuristics themselves are conservative and based upon all of the target information available at the IR level, the parameters are just setting a lower bound to weed out any outliers. You are correct in that being at the machine level, before or after RA, will give the most accurate heuristics but we feel there's an advantage to being at the IR level. At the IR level we can do so many more things that are either too difficult/complex for the machine level(e.g parameterization/outputs/etc). Not only can we do these things but they are available on all targets immediately, without the need for target hooks. The caution on the use of heuristics is understandable, but there comes a point when trade offs need to be made. We made the trade off for a loss in exact cost modeling to gain flexibility, coverage, and potential for further features. This trade off is the same made for quite a few IR level optimizations, including inlining. As for the worry about code size regressions, so far the results seem to support our hypothesis.</div><div> Thanks,</div><div>River Riddle</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <span dir="ltr"><<a href="mailto:jpaquette@apple.com" target="_blank">jpaquette@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><br><div>Hi River,</div><div><br></div><div>I’m working on the MachineOutliner pass at the MIR level. Working at the IR level sounds interesting! It also seems like our algorithms are similar. I was thinking of taking the suffix array route with the MachineOutliner in the future. </div><div><br></div><div>Anyway, I’d like to ask about this:</div><span><div><br><blockquote type="cite"><div>On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_5452462766631507231m_8379714576610871608Apple-interchange-newline"><div><span style="font-family:Arial;font-size:14.666666984558105px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:pre-wrap;word-spacing:0px;float:none;display:inline!important">The downside to having this type of transformation be at the IR level is it means there will be less accuracy in the cost model - we can somewhat accurately model the cost per instruction but we can’t get information on how a window of instructions may lower. This can cause regressions depending on the platform/codebase, therefore to help alleviate this there are several tunable parameters for the cost model.</span><br class="m_5452462766631507231m_8379714576610871608Apple-interchange-newline"></div></blockquote><br></div></span><div>The inliner is threshold-based and it can be rather unpredictable how it will impact the code size of a program. Do you have any idea as to how heuristics/thresholds/paramete<wbr>rs could be tuned to prevent this? In my experience, making good code size decisions with these sorts of passes requires a lot of knowledge about what instructions you’re dealing with exactly. I’ve seen the inliner cause some pretty serious code size regressions in projects due to small changes to the cost model/parameters which cause improvements in other projects. I’m a little worried that an IR-level outliner for code size would be doomed to a similar fate.</div><div><br></div><div>Perhaps it would be interesting to run this sort of pass pre-register allocation? This would help pull you away from having to use heuristics, but give you some more opportunities for finding repeated instruction sequences. I’ve thought of doing something like this in the future with the MachineOutliner and seeing how it goes.</div><span class="m_5452462766631507231HOEnZb"><font color="#888888"><div><br></div><div>- Jessica</div><br></font></span></div></blockquote></div><br></div>
</div></blockquote></div><br></div></div></div></div></div></div></blockquote></div><br></div>