<div dir="ltr"><div dir="ltr">On Sat, 24 Aug 2019 at 14:56, Arthur O'Dwyer <<a href="mailto:arthur.j.odwyer@gmail.com">arthur.j.odwyer@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>On Wed, Aug 21, 2019 at 1:40 AM John McFarlane <<a href="mailto:john@mcfarlane.name" target="_blank">john@mcfarlane.name</a>> wrote:</div><div>> On Tue, 20 Aug 2019 at 19:34, Ben Craig <<a href="mailto:ben.craig@ni.com" target="_blank">ben.craig@ni.com</a>> wrote:<br></div><div>>><br>>> I think a question was glossed over.  Exactly which directions should be inlined…<br>>><br>>> User callee into user caller (definitely not)<br>>> System callee into system caller (yes)<br>>> User callee into system caller (maybe?)<br>>> System callee into user caller (maybe?)<br>>><br>>> Perhaps number 3 should be prohibited because then a breakpoint on “my” function would either not get hit, or turn into multiple breakpoints.<br>>> Perhaps number 4 should be prohibited because it makes stepping across loop iterations in things like std::transform more difficult.<br></div><div>></div>> I deliberately don't bring up call direction because I don't think it's important and I want to keep things simple:<br>> - If the code is user code, it should not be inlined.<br>> - If the code is system code (including non-standard 3rd-party dependencies), it should be inlined.<br><br>I think Ben Craig's point about call direction is important. Or at least, his intuition about what you meant matches my intuition, whereas the words you just used in the paragraph above do <i><b>not</b></i> match my intuition about what you had meant in P1832.<br></div></div></div></blockquote><div><br class="gmail-Apple-interchange-newline">Thanks for providing a perspective of which I was unaware.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br>> Take a hypothetical call stack from this example program:<br>><br>> ?: std::strncmp(....) <- C standard library function, possibly an intrinsic, inline<br>> ?: std::string::operator==(char const*)  <- standard library function, inline<br>> 8: [](std::string const&) {....} <- lambda, don't inline<br>> ?: std::find_if(....) <- standard library taking lambda, inline<br>> 4: contains_sg15 <- user function containing lambda, don't inline<br>> 13: main <- user's main function, don't inline<br>><br>> I think whether to inline the lambda is mildly contentious but I think we both agree<div>> it should not be inlined. The real question is whether to inline `std::find_if`. I guess</div><div>> you're suggesting we don't inline it. I think -- on balance -- we probably should.<div><br></div><div>The verb "inline" doesn't operate on a <i>node</i> of this graph; it operates on an <i>edge</i> between two nodes. My intuition is that -Og should enable inlining on every edge where <i>both</i> endpoint nodes are in "system code," and on no other edges. With your above callstack:</div><div><br></div><div>Yes inline strncmp into operator==  (system into system)</div><div>No don't inline operator== into the user's lambda  (system into user)</div><div>No don't inline the user's lambda into std::find_if  (user into system)</div><div>No don't inline std::find_if into contains_sg15  (system into user)</div><div>No don't inline contains_sg15 into main  (user into user)</div><div><br></div><div>My intuition is that we don't want to inline system code into user code because that would mix "not-my-code" into the codegen for "my code." Imagine the user trying to step through `contains_sg15`, and suddenly they're seeing backward branches and loops. Where did those loops come from? Ah, the compiler inlined a bunch of crap from <algorithm>. That's not the debugging experience we want (says my intuition).  So we should <i><b>not</b></i> inline `std::find_if` into `contains_sg15`.</div></div></div></div></div></blockquote><div><br></div><div>Either yours or my particular formula for inlining choice improve the situation. That makes me happy. But I'm far from sold on your formula. (I'll concentrate on `find_if`.)</div><div><br></div><div>Little of what I imagine as the UX for this feature comes from intuition. It comes from my experience as a C developer, a 'trad' C++ developer and as a games developer. Imagine that we 'degenericize' the STL and imagine that `find_if` and all the other library calls are plain ol' functions that are defined in a separate TU. The TU is compiled into a .lib file at `-O2` and linked to the users' debug build. Q: What happens now when I step into `contains_sg15`? A: Exactly the same behaviour. So there's a consistency argument to be made. It's the 'User Expectation' from the title of P1832.</div><div><br></div><div>Secondly, I wonder if you overestimate just how little a typical game dev cares about being able to step through the loop inside a standard algorithm. For a number of reasons, that code is scary and intimidating for most humble devs. When they step into standard library code by accident, it interrupts their concentration and presents them with an _Uglified mess.</div><div><br></div><div>Thirdly, I explicitly simplified my description of `find_if` for the sake of argument. But now my simplification gets in the way. If you look at the real `find_if`, you'll see a great many levels of function calls. (I assume they are your nodes.) The first one when you step into `find_if` has no `for` loop. It is a wrapper to the implementation's algorithm. The last one before you step into the lambda also contains no `for` loop but is a function dedicated to invoking lambdas. So your formula for when to inline library functions will only degrade the experience. Instead, you'd need to say that *everything* between `contains_sg15` and the lambda must not be inlined. At this point, you're losing a lot of your performance gains which brings me on to...</div><div><br></div><div>Finally, because some of these abstractions boil down to very few (sometimes zero!) assembler instructions, any additional assembler you get from relaxing the criteria for inlining can heavily affect run-time performance and binary size. I suspect that on balance, the target audience here won't want to pay that price.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div><br></div><div>What about inlining the user's lambda into std::find_if?  I think we have always agreed on this one.  The user must be able to set a single breakpoint in the lambda, and run `contains_sg15`, and see the breakpoint get hit. If there's a second copy of the lambda's code inlined into `std::find_if`, then this scenario falls apart. So, we should <i><b>not</b></i> inline the user's lambda into `std::find_if`. In fact, the same argument implies that we should never inline user code into <i><b>anywhere</b></i>. One line of user source code should continue to map to one point in the compiled binary.</div></div></div></div></div></blockquote><div><br></div><div>Yes, we always want to facilitate breakpoints etc. in user code. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div><br></div><div><br>> Let's step through this program and let's assume that I only use the "step into"</div><div>> functionality of the debugger, because that's an easy way to 'visit' an entire</div><div>> program. Let's assume `find_if` is a single function.</div><div><br></div><div>I think you have to consider the workflow for people who use the "disassemble" functionality of the debugger, as well.</div></div></div></div></div></blockquote><div><br></div><div>I'd be interested to learn more.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div><br></div><div>Also, I'm not sure how the "step into/ single-step" functionality works, but I would have thought that it just means "continue forward until the current instruction is no longer associated with the same source line of code." Which means that it would still be possible to "step into" system code, because system code does still have line numbers associated with it.</div></div></div></div></div></blockquote><div><br></div><div>That's roughly my understanding, yes. You can test this with `forceinline`. There are debugging facilities (including Jeff Trull's) which I mention in the paper and which help avoid stepping into the 'wrong' functions.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div><br></div><div>I intuit that it should be easy to "step <i><b>over</b></i>" calls to library functions; which means that you must ensure that the call to `std::find_if` shows up as a call instruction in the binary. If we inline `std::find_if` into `contains_sg15`, then you can't "step over" it anymore.</div></div></div></div></div></blockquote><div><br></div><div>So, are you saying that if we inline `find_if` AND we generate debugging symbols, then the debugger's just going to jump us to `find_if` when we step over, right? I'm not sure if that's what happens. Again, you can test this with `forceinline`. I think the debugger's smart enough to skip over those lines. But I still expect my proposed solution to have some major flaw I didn't think about such as this!</div><div><br></div><div>Thanks,</div><div>John</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div><br></div><div>–Arthur</div></div></div></div></div>

</blockquote></div></div>