[cfe-dev] Varying per function optimisation based on include path?

Arthur O'Dwyer via cfe-dev cfe-dev at lists.llvm.org
Sun Aug 25 07:26:22 PDT 2019


On Sun, Aug 25, 2019 at 9:48 AM John McFarlane <john at mcfarlane.name> wrote:

> On Sat, 24 Aug 2019 at 14:56, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
> wrote:
>
>> On Wed, Aug 21, 2019 at 1:40 AM John McFarlane <john at mcfarlane.name>
>> wrote:
>> > On Tue, 20 Aug 2019 at 19:34, Ben Craig <ben.craig at ni.com> wrote:
>> >>
>> >> I think a question was glossed over.  Exactly which directions should
>> be inlined…
>> >>
>> >> User callee into user caller (definitely not)
>> >> System callee into system caller (yes)
>> >> User callee into system caller (maybe?)
>> >> System callee into user caller (maybe?)
>> >>
>> >> Perhaps number 3 should be prohibited because then a breakpoint on
>> “my” function would either not get hit, or turn into multiple breakpoints.
>> >> Perhaps number 4 should be prohibited because it makes stepping across
>> loop iterations in things like std::transform more difficult.
>> >
>> > I deliberately don't bring up call direction because I don't think it's
>> important and I want to keep things simple:
>> > - If the code is user code, it should not be inlined.
>> > - If the code is system code (including non-standard 3rd-party
>> dependencies), it should be inlined.
>>
>> I think Ben Craig's point about call direction is important. Or at least,
>> his intuition about what you meant matches my intuition, whereas the words
>> you just used in the paragraph above do *not* match my intuition about
>> what you had meant in P1832.
>>
>
> Thanks for providing a perspective of which I was unaware.
>
>>
>> > Take a hypothetical call stack from this example program:
>> >
>> > ?: std::strncmp(....) <- C standard library function, possibly an
>> intrinsic, inline
>> > ?: std::string::operator==(char const*)  <- standard library function,
>> inline
>> > 8: [](std::string const&) {....} <- lambda, don't inline
>> > ?: std::find_if(....) <- standard library taking lambda, inline
>> > 4: contains_sg15 <- user function containing lambda, don't inline
>> > 13: main <- user's main function, don't inline
>> >
>> > I think whether to inline the lambda is mildly contentious but I think
>> we both agree
>> > it should not be inlined. The real question is whether to inline
>> `std::find_if`. I guess
>> > you're suggesting we don't inline it. I think -- on balance -- we
>> probably should.
>>
>> The verb "inline" doesn't operate on a *node* of this graph; it operates
>> on an *edge* between two nodes. My intuition is that -Og should enable
>> inlining on every edge where *both* endpoint nodes are in "system code,"
>> and on no other edges. With your above callstack:
>>
>> Yes inline strncmp into operator==  (system into system)
>> No don't inline operator== into the user's lambda  (system into user)
>> No don't inline the user's lambda into std::find_if  (user into system)
>> No don't inline std::find_if into contains_sg15  (system into user)
>> No don't inline contains_sg15 into main  (user into user)
>>
>> My intuition is that we don't want to inline system code into user code
>> because that would mix "not-my-code" into the codegen for "my code."
>> Imagine the user trying to step through `contains_sg15`, and suddenly
>> they're seeing backward branches and loops. Where did those loops come
>> from? Ah, the compiler inlined a bunch of crap from <algorithm>. That's not
>> the debugging experience we want (says my intuition).  So we should *not*
>> inline `std::find_if` into `contains_sg15`.
>>
>
> Either yours or my particular formula for inlining choice improve the
> situation. That makes me happy. But I'm far from sold on your formula.
> (I'll concentrate on `find_if`.)
>
> Little of what I imagine as the UX for this feature comes from intuition.
> It comes from my experience as a C developer, a 'trad' C++ developer and as
> a games developer. Imagine that we 'degenericize' the STL and imagine that
> `find_if` and all the other library calls are plain ol' functions that are
> defined in a separate TU. The TU is compiled into a .lib file at `-O2` and
> linked to the users' debug build. Q: What happens now when I step into
> `contains_sg15`? A: Exactly the same behaviour. So there's a consistency
> argument to be made. It's the 'User Expectation' from the title of P1832.
>
> Secondly, I wonder if you overestimate just how little a typical game dev
> cares about being able to step through the loop inside a standard
> algorithm. For a number of reasons, that code is scary and intimidating for
> most humble devs. When they step into standard library code by accident, it
> interrupts their concentration and presents them with an _Uglified mess.
>

I think maybe you got my suggestion backwards.  I am suggesting that STL
code should *never* be mixed in with user code. For example, if the user
writes straight-line code like this...

    bool contains_sg15(const std::vector<std::string>& vec) {
        auto it = std::find_if(vec.begin(), vec.end(), [](const auto& elt)
{ return elt == "sg15"; });
        return it != vec.end();
    }

...then I would consider it a bad idea for the compiler to inline the loop
from `std::find_if` into the user's code. That would lead to the user's
being forced to "step through the loop inside a standard algorithm," as you
say. I would intuitively expect that at -Og, the user's `contains_sg15`
function would codegen to exactly what it does today:

    - CALL to begin()
    - CALL to end()
    - CALL to std::find_if
    - CALL to end()
    - CALL to operator!=

If the compiler inlined any of those calls, then the assembly code for
`contains_sg15` would suddenly contain assembly code corresponding to C++
code that the user didn't write. And the user doesn't want that. So it's
good that `-Og` does not inline system code into user code.

Thirdly, I explicitly simplified my description of `find_if` for the sake
> of argument. But now my simplification gets in the way. If you look at the
> real `find_if`, you'll see a great many levels of function calls. (I assume
> they are your nodes.)
>

That's right. That's why it's important that P1832 says "let's permit the
compiler to inline system code into system code."
By optimizing the body of `std::find_if`, we can reduce its code size by
about half (from 182 lines to 93 lines in this test case
<https://godbolt.org/z/uU_YWg>).  Notice that I am not inlining find_if
into contains_sg15, and I am not inlining the body of the lambda into
find_if. We *can* get P1832's huge performance boost without ever mixing
user and system code together.


> The first one when you step into `find_if` has no `for` loop. It is a
> wrapper to the implementation's algorithm. The last one before you step
> into the lambda also contains no `for` loop but is a function dedicated to
> invoking lambdas. So your formula for when to inline library functions will
> only degrade the experience. Instead, you'd need to say that *everything*
> between `contains_sg15` and the lambda must not be inlined. At this point,
> you're losing a lot of your performance gains which brings me on to...
>

I don't understand the sentence "Instead, you'd need to say that
*everything* between `contains_sg15` and the lambda must not be inlined."
That's 100% exactly the edges that we *do* want to inline!


> Finally, because some of these abstractions boil down to very few
> (sometimes zero!) assembler instructions, any additional assembler you get
> from relaxing the criteria for inlining can heavily affect run-time
> performance and binary size. I suspect that on balance, the target audience
> here won't want to pay that price.
>

Can you be specific about what code you're talking about? I am sure that
you are misunderstanding my suggestion, but I can't figure out what you
think the suggestion is.


> Let's step through this program and let's assume that I only use the
>> "step into"
>> > functionality of the debugger, because that's an easy way to 'visit' an
>> entire
>> > program. Let's assume `find_if` is a single function.
>>
>> I think you have to consider the workflow for people who use the
>> "disassemble" functionality of the debugger, as well.
>>
>
> I'd be interested to learn more.
>

This wasn't intended as a teaser; I meant it to be a summation of the
suggestions in my email. Inlining loopy system code into the middle of
straight-line user code is a bad idea *because* it messes with people's
ability to understand the disassembly.


I intuit that it should be easy to "step *over*" calls to library
>> functions; which means that you must ensure that the call to `std::find_if`
>> shows up as a call instruction in the binary. If we inline `std::find_if`
>> into `contains_sg15`, then you can't "step over" it anymore.
>>
>
> So, are you saying that if we inline `find_if` AND we generate debugging
> symbols, then the debugger's just going to jump us to `find_if` when we
> step over, right?
>

No, not at all.  You're thinking of "step into."  "Step into" steps down
into function calls. "Step over" steps *over* function calls. The metaphor
here is that a function call is like a pothole or trapdoor down into the
next lower level of the code. You can either step "into" the hole, or step
"over" it.

HTH,
Arthur

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190825/14e21d32/attachment.html>


More information about the cfe-dev mailing list