<div dir="ltr"><div dir="ltr"><div dir="ltr">On Sun, Aug 25, 2019 at 9:48 AM John McFarlane <<a href="mailto:john@mcfarlane.name">john@mcfarlane.name</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Sat, 24 Aug 2019 at 14:56, Arthur O'Dwyer <<a href="mailto:arthur.j.odwyer@gmail.com" target="_blank">arthur.j.odwyer@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>On Wed, Aug 21, 2019 at 1:40 AM John McFarlane <<a href="mailto:john@mcfarlane.name" target="_blank">john@mcfarlane.name</a>> wrote:</div><div>> On Tue, 20 Aug 2019 at 19:34, Ben Craig <<a href="mailto:ben.craig@ni.com" target="_blank">ben.craig@ni.com</a>> wrote:<br></div><div>>><br>>> I think a question was glossed over. Exactly which directions should be inlined…<br>>><br>>> User callee into user caller (definitely not)<br>>> System callee into system caller (yes)<br>>> User callee into system caller (maybe?)<br>>> System callee into user caller (maybe?)<br>>><br>>> Perhaps number 3 should be prohibited because then a breakpoint on “my” function would either not get hit, or turn into multiple breakpoints.<br>>> Perhaps number 4 should be prohibited because it makes stepping across loop iterations in things like std::transform more difficult.<br></div><div>></div>> I deliberately don't bring up call direction because I don't think it's important and I want to keep things simple:<br>> - If the code is user code, it should not be inlined.<br>> - If the code is system code (including non-standard 3rd-party dependencies), it should be inlined.<br><br>I think Ben Craig's point about call direction is important. Or at least, his intuition about what you meant matches my intuition, whereas the words you just used in the paragraph above do <i><b>not</b></i> match my intuition about what you had meant in P1832.<br></div></div></div></blockquote><div><br class="gmail-m_3974124135071824659gmail-Apple-interchange-newline">Thanks for providing a perspective of which I was unaware.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br>> Take a hypothetical call stack from this example program:<br>><br>> ?: std::strncmp(....) <- C standard library function, possibly an intrinsic, inline<br>> ?: std::string::operator==(char const*) <- standard library function, inline<br>> 8: [](std::string const&) {....} <- lambda, don't inline<br>> ?: std::find_if(....) <- standard library taking lambda, inline<br>> 4: contains_sg15 <- user function containing lambda, don't inline<br>> 13: main <- user's main function, don't inline<br>><br>> I think whether to inline the lambda is mildly contentious but I think we both agree<div>> it should not be inlined. The real question is whether to inline `std::find_if`. I guess</div><div>> you're suggesting we don't inline it. I think -- on balance -- we probably should.<div><br></div><div>The verb "inline" doesn't operate on a <i>node</i> of this graph; it operates on an <i>edge</i> between two nodes. My intuition is that -Og should enable inlining on every edge where <i>both</i> endpoint nodes are in "system code," and on no other edges. With your above callstack:</div><div><br></div><div>Yes inline strncmp into operator== (system into system)</div><div>No don't inline operator== into the user's lambda (system into user)</div><div>No don't inline the user's lambda into std::find_if (user into system)</div><div>No don't inline std::find_if into contains_sg15 (system into user)</div><div>No don't inline contains_sg15 into main (user into user)</div><div><br></div><div>My intuition is that we don't want to inline system code into user code because that would mix "not-my-code" into the codegen for "my code." Imagine the user trying to step through `contains_sg15`, and suddenly they're seeing backward branches and loops. Where did those loops come from? Ah, the compiler inlined a bunch of crap from <algorithm>. That's not the debugging experience we want (says my intuition). So we should <i><b>not</b></i> inline `std::find_if` into `contains_sg15`.</div></div></div></div></div></blockquote><div><br></div><div>Either yours or my particular formula for inlining choice improve the situation. That makes me happy. But I'm far from sold on your formula. (I'll concentrate on `find_if`.)</div><div><br></div><div>Little of what I imagine as the UX for this feature comes from intuition. It comes from my experience as a C developer, a 'trad' C++ developer and as a games developer. Imagine that we 'degenericize' the STL and imagine that `find_if` and all the other library calls are plain ol' functions that are defined in a separate TU. The TU is compiled into a .lib file at `-O2` and linked to the users' debug build. Q: What happens now when I step into `contains_sg15`? A: Exactly the same behaviour. So there's a consistency argument to be made. It's the 'User Expectation' from the title of P1832.</div><div><br></div><div>Secondly, I wonder if you overestimate just how little a typical game dev cares about being able to step through the loop inside a standard algorithm. For a number of reasons, that code is scary and intimidating for most humble devs. When they step into standard library code by accident, it interrupts their concentration and presents them with an _Uglified mess.</div></div></div></blockquote><div><br></div><div>I think maybe you got my suggestion backwards. I am suggesting that STL code should <i><b>never</b></i> be mixed in with user code. For example, if the user writes straight-line code like this...</div><div><br></div><div> bool contains_sg15(const std::vector<std::string>& vec) {</div><div> auto it = std::find_if(vec.begin(), vec.end(), [](const auto& elt) { return elt == "sg15"; });</div><div> return it != vec.end();</div><div> }</div><div><br></div><div>...then I would consider it a bad idea for the compiler to inline the loop from `std::find_if` into the user's code. That would lead to the user's being forced to "step through the loop inside a standard algorithm," as you say. I would intuitively expect that at -Og, the user's `contains_sg15` function would codegen to exactly what it does today:</div><div><br></div><div> - CALL to begin()</div><div> - CALL to end()</div><div> - CALL to std::find_if</div><div> - CALL to end()</div><div> - CALL to operator!=</div><div><br></div><div>If the compiler inlined any of those calls, then the assembly code for `contains_sg15` would suddenly contain assembly code corresponding to C++ code that the user didn't write. And the user doesn't want that. So it's good that `-Og` does not inline system code into user code.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Thirdly, I explicitly simplified my description of `find_if` for the sake of argument. But now my simplification gets in the way. If you look at the real `find_if`, you'll see a great many levels of function calls. (I assume they are your nodes.)</div></div></div></blockquote><div><br></div><div>That's right. That's why it's important that P1832 says "let's permit the compiler to inline system code into system code."</div><div>By optimizing the body of `std::find_if`, we can reduce its code size by about half (<a href="https://godbolt.org/z/uU_YWg">from 182 lines to 93 lines in this test case</a>). Notice that I am not inlining find_if into contains_sg15, and I am not inlining the body of the lambda into find_if. We <i><b>can</b></i> get P1832's huge performance boost without ever mixing user and system code together.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> The first one when you step into `find_if` has no `for` loop. It is a wrapper to the implementation's algorithm. The last one before you step into the lambda also contains no `for` loop but is a function dedicated to invoking lambdas. So your formula for when to inline library functions will only degrade the experience. Instead, you'd need to say that *everything* between `contains_sg15` and the lambda must not be inlined. At this point, you're losing a lot of your performance gains which brings me on to...<br></div></div></div></blockquote><div><br></div><div>I don't understand the sentence "Instead, you'd need to say that *everything* between `contains_sg15` and the lambda must not be inlined." That's 100% exactly the edges that we <i><b>do</b></i> want to inline!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div>Finally, because some of these abstractions boil down to very few (sometimes zero!) assembler instructions, any additional assembler you get from relaxing the criteria for inlining can heavily affect run-time performance and binary size. I suspect that on balance, the target audience here won't want to pay that price.<br></div></div></div></blockquote><div><br></div><div>Can you be specific about what code you're talking about? I am sure that you are misunderstanding my suggestion, but I can't figure out what you think the suggestion is.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div>> Let's step through this program and let's assume that I only use the "step into"</div><div>> functionality of the debugger, because that's an easy way to 'visit' an entire</div><div>> program. Let's assume `find_if` is a single function.</div><div><br></div><div>I think you have to consider the workflow for people who use the "disassemble" functionality of the debugger, as well.</div></div></div></div></div></blockquote><div><br></div><div>I'd be interested to learn more.</div></div></div></blockquote><div><br></div><div>This wasn't intended as a teaser; I meant it to be a summation of the suggestions in my email. Inlining loopy system code into the middle of straight-line user code is a bad idea <i><b>because</b></i> it messes with people's ability to understand the disassembly.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div>I intuit that it should be easy to "step <i><b>over</b></i>" calls to library functions; which means that you must ensure that the call to `std::find_if` shows up as a call instruction in the binary. If we inline `std::find_if` into `contains_sg15`, then you can't "step over" it anymore.</div></div></div></div></div></blockquote><div><br></div><div>So, are you saying that if we inline `find_if` AND we generate debugging symbols, then the debugger's just going to jump us to `find_if` when we step over, right?</div></div></div></blockquote><div><br></div><div>No, not at all. You're thinking of "step into." "Step into" steps down into function calls. "Step over" steps <i><b>over</b></i> function calls. The metaphor here is that a function call is like a pothole or trapdoor down into the next lower level of the code. You can either step "into" the hole, or step "over" it.</div><div><br></div><div>HTH,</div><div>Arthur</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
</blockquote></div></div>
</blockquote></div></div></div>