<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">It might also be cool to see how various levels of outlining + the inliner would interact.<div class=""><br class=""></div><div class="">For example, say we did something like this</div><div class=""><br class=""></div><div class="">1. IR outline</div><div class="">2. Inline</div><div class="">3. MIR outline</div><div class=""><br class=""></div><div class="">The inliner should be capable of undoing bad outlining decisions (like outlining from, say, hot loops). The MIR outliner should be able to undo bad inlining decisions. Since each cost model is different (and they’re all working on different code), no pass should ever undo all decisions made by another unless they all ended up being *bad* decisions. It might be possible to have each pass play together in a way that you end up with a nice balance between performance and size.</div><div class=""><br class=""></div><div class="">- Jessica</div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jul 25, 2017, at 9:31 AM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Jul 25, 2017, at 9:24 AM, Jessica Paquette <<a href="mailto:jpaquette@apple.com" class="">jpaquette@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">The two passes are pretty different in their approaches to congruency finding, so I don't think it helps to group them as though they were interchangeable "outliner technology". The two passes might be totally orthogonal.</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">I think that based off how River described his algorithm, the actual underlying method should be pretty similar. If it is, then we could probably compare the performance of the suffix tree vs. suffix array method and then create a general outliner that can be run at any level of representation. </div></div></div></blockquote><div class=""><br class=""></div><div class="">+1, I believe that would be the ideal outcome.</div><div class="">I can see a split like this:</div><div class="">- Outliner algorithm IR-agnostic (with possibly different mode: suffix tree, array.)</div><div class="">- Pluggable cost model</div><div class="">- Rewriter</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">By that I mean that the actual suffix tree/array candidate search algorithm would be separate from the actual implementation of *outlining things*. The actual implementation at each level of representation would define</div><div class=""><br class=""></div><div class="">- How to outline a sequence of instructions</div>- An equivalence/congruence scheme for two instructions<div class="">- A cost model</div><div class=""><br class=""></div><div class="">I don’t think it’d be too difficult to split it up in that sort of way. I’d also like to experiment with pre-regalloc outlining in general, so it’d make it easier to explore that route as well.</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">1. if you run the LLVM IR level code size outliner, then the MachineOutliner fails to find any significant redundancies.</div><div class=""><br class=""></div><div class="">2. if you run the LLVM IR level code size outliner, then the MachineOutliner finds just as many redundancies.</div></div></div></div></blockquote><br class=""></div><div class="">This would be interesting. The MachineOutliner deems two instructions equivalent iff their opcodes and operands are identical. This isn’t super flexible (pre-register allocation outlining would be much better). However, an IR-level outliner has a lot more wiggle-room. It would be interesting to see how certain equivalence schemes perform. Once again, making assumptions about River’s algorithm, but the equivalence/congruence schemes are probably where the most significant differences in the way each technique handles outlining lie.</div><div class=""><br class=""></div><div class="">- Jessica</div><div class=""><div class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 10:25 PM, Sean Silva via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><br class="Apple-interchange-newline"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 4:14 PM, Quentin Colombet via llvm-dev<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div dir="auto" style="word-wrap: break-word;" class="">Hi River,</div><div dir="auto" style="word-wrap: break-word;" class=""><br class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 2:36 PM, River Riddle <<a href="mailto:riddleriver@gmail.com" target="_blank" class="">riddleriver@gmail.com</a>> wrote:</div><br class="gmail-m_8670241827534659112Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Quentin,<div class=""> I appreciate the feedback. When I reference the cost of Target Hooks it's mainly for maintainability and cost on a target author. We want to keep the intrusion into target information minimized. The heuristics used for the outliner are the same used by any other IR level pass seeking target information, i.e TTI for the most part. I can see where you are coming from with "<span style="font-size: 12.8px;" class="">having heuristics solely focused on code size do not seem realistic", but I don't agree with that statement.</span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">If you only want code size I agree it makes sense, but I believe, even in Oz, we probably don’t want to slow the code by a big factor for a couple bytes. That’s what I wanted to say and what I wanted to point out is that you need to have some kind of model for the performance to avoid those worst cases. Unless we don’t care :).</div><span class="gmail-"><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size: 12.8px;" class="">I think there is a disconnect on heuristics. The only user tunable parameters are the lower bound parameters(to the cost model), the actual analysis(heuristic calculation) is based upon TTI information. </span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I don’t see how you can get around adding more hooks to know how a specific function prototype is going to be lowered (e.g., i64 needs to be split into two registers, fourth and onward parameters need to be pushed on the stack and so on). Those change the code size benefit.</div><span class="gmail-"><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size: 12.8px;" class=""> When you say "Would still be interesting to see how well this could perform on some exact model (i.e., at the Machine level), IMO." I am slightly confused as to what you mean. I do not intend to try and implement this algorithm at the MIR level given that it exists in Machine Outliner.</span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Of course, I don’t expect you to do that :). What I meant is that the claim that IR offers the better trade off is not based on hard evidences. I actually don’t buy it.</div><div class=""><br class=""></div><div class="">My point was to make sure I understand what you are trying to solve and given you have mentioned the MachineOutliner, why you are not working on improving it instead of suggesting a new framework.</div><div class="">Don’t take me wrong, maybe creating a new framework at the IR level is the right thing to do, but I still didn’t get that from your comments.</div><span class="gmail-"><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size: 12.8px;" class="">There are several comparison benchmarks given in the "More detailed performance data" of the original RFC. It includes comparisons to the Machine Outliner when possible(I can't build clang on Linux with Machine Outliner). I welcome any and all discussion on the placement of the outliner in LLVM.<br class=""></span></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">My fear with a new framework is that we are going to split the effort for pushing the outliner technology forward and I’d like to avoid that if at all possible.</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">The two passes are pretty different in their approaches to congruency finding, so I don't think it helps to group them as though they were interchangeable "outliner technology". The two passes might be totally orthogonal.</div><div class=""><br class=""></div><div class="">I can imagine two extreme outcomes (reality is probably somewhere in between):</div><div class=""><br class=""></div><div class="">1. if you run the LLVM IR level code size outliner, then the MachineOutliner fails to find any significant redundancies.</div><div class=""><br class=""></div><div class="">2. if you run the LLVM IR level code size outliner, then the MachineOutliner finds just as many redundancies.</div><div class=""><br class=""></div><div class="">It would be good if River could get some data about this.<br class=""></div><div class=""><br class=""></div><div class="">-- Sean Silva</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div dir="auto" style="word-wrap: break-word;" class=""><div class=""><div class=""><br class=""></div><div class="">Now, to be more concrete on your proposal, could you describe the cost model for deciding what to outline? (Really the cost model, not the suffix algo.)</div><div class="">Are outlined functions pushed into the list candidates for further outlining?</div><div class=""><div class="gmail-h5"><div class=""><br class=""></div><div class="">Cheers,</div><div class="">-Quentin</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><span style="font-size: 12.8px;" class=""> Thanks,</span></div><div class=""><span style="font-size: 12.8px;" class="">River Riddle</span></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 1:42 PM, Quentin Colombet<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:qcolombet@apple.com" target="_blank" class="">qcolombet@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class="">Hi River,<div class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On Jul 24, 2017, at 11:55 AM, River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="gmail-m_8670241827534659112m_-3491381026714088952Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Hi Jessica,</div><div class=""> The comparison to the inliner is an interesting one but we think it's important to note the difference in the use of heuristics. The inliner is juggling many different tasks at the same time, execution speed, code size, etc. which can cause the parameters to be very sensitive depending on the benchmark/platform/etc. The outliners heuristics are focused solely on the potential code size savings from outlining, and is thus only sensitive to the current platform. This only creates a problem when we are over estimating the potential cost of a set of instructions for a particular target. The cost model parameters are only minimums: instruction sequence length, estimated benefit, occurrence amount. The heuristics themselves are conservative and based upon all of the target information available at the IR level, the parameters are just setting a lower bound to weed out any outliers. You are correct in that being at the machine level, before or after RA, will give the most accurate heuristics but we feel there's an advantage to being at the IR level. At the IR level we can do so many more things that are either too difficult/complex for the machine level(e.g parameterization/outputs/etc). Not only can we do these things but they are available on all targets immediately, without the need for target hooks. The caution on the use of heuristics is understandable, but there comes a point when trade offs need to be made. We made the trade off for a loss in exact cost modeling to gain flexibility, coverage, and potential for further features. This trade off is the same made for quite a few IR level optimizations, including inlining. As for the worry about code size regressions, so far the results seem to support our hypothesis.</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Would still be interesting to see how well this could perform on some exact model (i.e., at the Machine level), IMO. Target hooks are cheap and choosing an implementation because it is simpler might not be the right long term solution.</div><div class="">At the very least, to know what trade-off we are making, having prototypes with the different approaches sounds sensible.</div><div class="">In particular, all the heuristics about cost for parameter passing (haven’t checked how you did it) sounds already complex enough and would require target hooks. Therefore, I am not seeing a clear win with an IR approach here.</div><div class=""><br class=""></div><div class="">Finally, having heuristics solely focused on code size do not seem realistic. Indeed, I am guessing you have some thresholds to avoid outlining some piece of code too small that would end up adding a whole lot of indirections and I don’t like magic numbers in general :).</div><div class=""><br class=""></div><div class="">To summarize, I wanted to point out that an IR approach is not as a clear win as you describe and would thus deserve more discussion.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">-Quentin</div><br class=""><blockquote type="cite" class=""><div class=""><span class=""><div dir="ltr" class=""><div class=""> Thanks,</div><div class="">River Riddle</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:jpaquette@apple.com" target="_blank" class="">jpaquette@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><br class=""><div class="">Hi River,</div><div class=""><br class=""></div><div class="">I’m working on the MachineOutliner pass at the MIR level. Working at the IR level sounds interesting! It also seems like our algorithms are similar. I was thinking of taking the suffix array route with the MachineOutliner in the future. </div><div class=""><br class=""></div><div class="">Anyway, I’d like to ask about this:</div><span class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="gmail-m_8670241827534659112m_-3491381026714088952m_8379714576610871608Apple-interchange-newline"><div class=""><span style="font-family: Arial; font-size: 14.6667px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; word-spacing: 0px; float: none; display: inline;" class="">The downside to having this type of transformation be at the IR level is it means there will be less accuracy in the cost model - we can somewhat accurately model the cost per instruction but we can’t get information on how a window of instructions may lower. This can cause regressions depending on the platform/codebase, therefore to help alleviate this there are several tunable parameters for the cost model.</span><br class="gmail-m_8670241827534659112m_-3491381026714088952m_8379714576610871608Apple-interchange-newline"></div></blockquote><br class=""></div></span><div class="">The inliner is threshold-based and it can be rather unpredictable how it will impact the code size of a program. Do you have any idea as to how heuristics/thresholds/paramete<wbr class="">rs could be tuned to prevent this? In my experience, making good code size decisions with these sorts of passes requires a lot of knowledge about what instructions you’re dealing with exactly. I’ve seen the inliner cause some pretty serious code size regressions in projects due to small changes to the cost model/parameters which cause improvements in other projects. I’m a little worried that an IR-level outliner for code size would be doomed to a similar fate.</div><div class=""><br class=""></div><div class="">Perhaps it would be interesting to run this sort of pass pre-register allocation? This would help pull you away from having to use heuristics, but give you some more opportunities for finding repeated instruction sequences. I’ve thought of doing something like this in the future with the MachineOutliner and seeing how it goes.</div><span class="gmail-m_8670241827534659112m_-3491381026714088952HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class="">- Jessica</div><br class=""></font></span></div></blockquote></div><br class=""></div></span>______________________________<wbr class="">_________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class=""></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></div></blockquote></div></div></div><br class=""></div></div><br class="">______________________________<wbr class="">_________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class=""><br class=""></blockquote></div><br class=""></div></div><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">LLVM Developers mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="mailto:llvm-dev@lists.llvm.org" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">llvm-dev@lists.llvm.org</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div></blockquote></div><br class=""></div></div></div></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></body></html>