[cfe-dev] Varying per function optimisation based on include path?

Sun Aug 25 06:48:09 PDT 2019

On Sat, 24 Aug 2019 at 14:56, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
wrote:

> On Wed, Aug 21, 2019 at 1:40 AM John McFarlane <john at mcfarlane.name>
> wrote:
> > On Tue, 20 Aug 2019 at 19:34, Ben Craig <ben.craig at ni.com> wrote:
> >>
> >> I think a question was glossed over.  Exactly which directions should
> be inlined…
> >>
> >> User callee into user caller (definitely not)
> >> System callee into system caller (yes)
> >> User callee into system caller (maybe?)
> >> System callee into user caller (maybe?)
> >>
> >> Perhaps number 3 should be prohibited because then a breakpoint on “my”
> function would either not get hit, or turn into multiple breakpoints.
> >> Perhaps number 4 should be prohibited because it makes stepping across
> loop iterations in things like std::transform more difficult.
> >
> > I deliberately don't bring up call direction because I don't think it's
> important and I want to keep things simple:
> > - If the code is user code, it should not be inlined.
> > - If the code is system code (including non-standard 3rd-party
> dependencies), it should be inlined.
>
> I think Ben Craig's point about call direction is important. Or at least,
> his intuition about what you meant matches my intuition, whereas the words
> you just used in the paragraph above do *not* match my intuition about
> what you had meant in P1832.
>

Thanks for providing a perspective of which I was unaware.

>
> > Take a hypothetical call stack from this example program:
> >
> > ?: std::strncmp(....) <- C standard library function, possibly an
> intrinsic, inline
> > ?: std::string::operator==(char const*)  <- standard library function,
> inline
> > 8: [](std::string const&) {....} <- lambda, don't inline
> > ?: std::find_if(....) <- standard library taking lambda, inline
> > 4: contains_sg15 <- user function containing lambda, don't inline
> > 13: main <- user's main function, don't inline
> >
> > I think whether to inline the lambda is mildly contentious but I think
> we both agree
> > it should not be inlined. The real question is whether to inline
> `std::find_if`. I guess
> > you're suggesting we don't inline it. I think -- on balance -- we
> probably should.
>
> The verb "inline" doesn't operate on a *node* of this graph; it operates
> on an *edge* between two nodes. My intuition is that -Og should enable
> inlining on every edge where *both* endpoint nodes are in "system code,"
> and on no other edges. With your above callstack:
>
> Yes inline strncmp into operator==  (system into system)
> No don't inline operator== into the user's lambda  (system into user)
> No don't inline the user's lambda into std::find_if  (user into system)
> No don't inline std::find_if into contains_sg15  (system into user)
> No don't inline contains_sg15 into main  (user into user)
>
> My intuition is that we don't want to inline system code into user code
> because that would mix "not-my-code" into the codegen for "my code."
> Imagine the user trying to step through `contains_sg15`, and suddenly
> they're seeing backward branches and loops. Where did those loops come
> from? Ah, the compiler inlined a bunch of crap from <algorithm>. That's not
> the debugging experience we want (says my intuition).  So we should *not*
> inline `std::find_if` into `contains_sg15`.
>

Either yours or my particular formula for inlining choice improve the
situation. That makes me happy. But I'm far from sold on your formula.
(I'll concentrate on `find_if`.)

Little of what I imagine as the UX for this feature comes from intuition.
It comes from my experience as a C developer, a 'trad' C++ developer and as
a games developer. Imagine that we 'degenericize' the STL and imagine that
`find_if` and all the other library calls are plain ol' functions that are
defined in a separate TU. The TU is compiled into a .lib file at `-O2` and
linked to the users' debug build. Q: What happens now when I step into
`contains_sg15`? A: Exactly the same behaviour. So there's a consistency
argument to be made. It's the 'User Expectation' from the title of P1832.

Secondly, I wonder if you overestimate just how little a typical game dev
cares about being able to step through the loop inside a standard
algorithm. For a number of reasons, that code is scary and intimidating for
most humble devs. When they step into standard library code by accident, it
interrupts their concentration and presents them with an _Uglified mess.

Thirdly, I explicitly simplified my description of `find_if` for the sake
of argument. But now my simplification gets in the way. If you look at the
real `find_if`, you'll see a great many levels of function calls. (I assume
they are your nodes.) The first one when you step into `find_if` has no
`for` loop. It is a wrapper to the implementation's algorithm. The last one
before you step into the lambda also contains no `for` loop but is a
function dedicated to invoking lambdas. So your formula for when to inline
library functions will only degrade the experience. Instead, you'd need to
say that *everything* between `contains_sg15` and the lambda must not be
inlined. At this point, you're losing a lot of your performance gains which
brings me on to...

Finally, because some of these abstractions boil down to very few
(sometimes zero!) assembler instructions, any additional assembler you get
from relaxing the criteria for inlining can heavily affect run-time
performance and binary size. I suspect that on balance, the target audience
here won't want to pay that price.

>
> What about inlining the user's lambda into std::find_if?  I think we have
> always agreed on this one.  The user must be able to set a single
> breakpoint in the lambda, and run `contains_sg15`, and see the breakpoint
> get hit. If there's a second copy of the lambda's code inlined into
> `std::find_if`, then this scenario falls apart. So, we should *not*
> inline the user's lambda into `std::find_if`. In fact, the same argument
> implies that we should never inline user code into *anywhere*. One line
> of user source code should continue to map to one point in the compiled
> binary.
>

Yes, we always want to facilitate breakpoints etc. in user code.

>
>
> > Let's step through this program and let's assume that I only use the
> "step into"
> > functionality of the debugger, because that's an easy way to 'visit' an
> entire
> > program. Let's assume `find_if` is a single function.
>
> I think you have to consider the workflow for people who use the
> "disassemble" functionality of the debugger, as well.
>

I'd be interested to learn more.

>
> Also, I'm not sure how the "step into/ single-step" functionality works,
> but I would have thought that it just means "continue forward until the
> current instruction is no longer associated with the same source line of
> code." Which means that it would still be possible to "step into" system
> code, because system code does still have line numbers associated with it.
>

That's roughly my understanding, yes. You can test this with `forceinline`.
There are debugging facilities (including Jeff Trull's) which I mention in
the paper and which help avoid stepping into the 'wrong' functions.

>
> I intuit that it should be easy to "step *over*" calls to library
> functions; which means that you must ensure that the call to `std::find_if`
> shows up as a call instruction in the binary. If we inline `std::find_if`
> into `contains_sg15`, then you can't "step over" it anymore.
>

So, are you saying that if we inline `find_if` AND we generate debugging
symbols, then the debugger's just going to jump us to `find_if` when we
step over, right? I'm not sure if that's what happens. Again, you can test
this with `forceinline`. I think the debugger's smart enough to skip over
those lines. But I still expect my proposed solution to have some major
flaw I didn't think about such as this!

Thanks,
John

>
> –Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190825/47ed5e17/attachment.html>