[cfe-dev] Varying per function optimisation based on include path?

Tue Aug 20 22:39:52 PDT 2019

I deliberately don't bring up call direction because I don't think it's
important and I want to keep things simple:
- If the code is user code, it should not be inlined.
- If the code is system code (including non-standard 3rd-party
dependencies), it should be inlined.

Take a hypothetical call stack from this example program
<https://wandbox.org/permlink/oI43EGHIERSNpy52>:

?: std::strncmp(....) <- C standard library function, possibly an
intrinsic, inline
?: std::string::operator==(char const*)  <- standard library function,
inline
8: [](std::string const&) {....} <- lambda, don't inline
?: std::find_if(....) <- standard library taking lambda, inline
4: contains_sg15 <- user function containing lambda, don't inline
13: main <- user's main function, don't inline

I think whether to inline the lambda is mildly contentious but I think we
both agree it should not be inlined. The real question is whether to inline
`std::find_if`. I guess you're suggesting we don't inline it. I think -- on
balance -- we probably should.

Let's step through this program and let's assume that I only use the "step
into" functionality of the debugger, because that's an easy way to 'visit'
an entire program. Let's assume `find_if` is a single function.

When I step into the call to `std::find_if`, I think it should skip through
`std::find_if` and take me directly to line 9 in the lambda function in
`contains_sg15`. Then, when I "step into" in the lambda, it'll step over
the `std::string` call and take me directly to line 10. (That's what it
does in GDB at least.) Then when I "step into" again, it *should* take me
back to line 9 in the second lambda invocation.

Now, if that last action doesn't work, I think that's where you have the
concern. I'm not entirely sure what'll happen once `std::find_if` is
inlined. I know that without inlining, "step into" eventually does take me
back to line 9 of the lambda body. If it doesn't, it's going to kick me
back out into the body of `contains_sg15` which is a problem.

In that case, yes, maybe saying that functions which call non-`-isystem`
functions should not be inlined is a rule to consider. But that instantly
makes things more complicated: we now have a new reason why the same
function might be both inlined and not inlined in the same TU. I don't know
if that's a problem. We now send the user into the guts of a standard
library call. That *is* a problem. That code looks frightening and
obfuscated to the average developer.

A workaround to the problem would be to put a breakpoint in the lambda
function. I personally would be perfectly happy with that compromise. We're
looking at a problem which game developers (among others) experience then
in-header functions are defined in the user's TU. In C code and non-modern
C++, the offending code is defined in a separate .cpp file.

Imagine if `std::find_if` could be compiled in a separate TU. And imagine
that TU was compiled with `-O2`. What would happen then? You might
experience the same debugging difficulty. The same workaround, using a
breakpoint, would work there also. We'd at least have parity with
non-modern C++. That's the main aim here: not to improve the state of the
art for C and C++, merely to help C++ catch up with C.

HTH,
John

On Tue, 20 Aug 2019 at 19:34, Ben Craig <ben.craig at ni.com> wrote:

> I think a question was glossed over.  Exactly which directions should be
> inlined…
>
>    1. User callee into user caller (definitely not)
>    2. System callee into system caller (yes)
>    3. User callee into system caller (maybe?)
>    4. System callee into user caller (maybe?)
>
>
>
> Perhaps number 3 should be prohibited because then a breakpoint on “my”
> function would either not get hit, or turn into multiple breakpoints.
>
>
>
> Perhaps number 4 should be prohibited because it makes stepping across
> loop iterations in things like std::transform more difficult.
>
>
>
>
>
> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> *On Behalf Of *via
> cfe-dev
> *Sent:* Tuesday, August 20, 2019 12:59 PM
> *To:* arthur.j.odwyer at gmail.com
> *Cc:* jonathanchesterfield at gmail.com; john at mcfarlane.name;
> cfe-dev at lists.llvm.org
> *Subject:* [EXTERNAL] Re: [cfe-dev] Varying per function optimisation
> based on include path?
>
>
>
> Ah, I'd forgotten that Og prefers not to inline.
>
> Distinguishing optimization levels within one translation unit is tricky
> given the current way we build optimization pipelines. They are *not*
> designed to handle function-level differences in optimization levels.
> Trying to (essentially) mix O1 and O2 in the same translation unit is a
> radical departure from how LLVM thinks about optimization.  ('optnone' is a
> special case where passes effectively disable themselves when presented
> with an 'optnone' function. Generalizing that to more optimization levels
> is a seriously invasive proposition.)
>
>
>
> Re the "symbols" confusion, broadly speaking you can separate debug info
> into that which describes the source (types, variables, etc), and that
> which describes the generated code (to a first approximation, the
> instruction<->source mapping).  So the suggestion in this thread is to
> retain the former but not the latter.
>
> In this exercise, if we genuinely want to *prevent* debugging of
> defined-in-system-header functions (which seems like a highly questionable
> feature) it could be done with judicious application of the 'nodebug'
> attribute.  Not hard, really.
>
> --paulr
>
>
>
> *From:* Arthur O'Dwyer [mailto:arthur.j.odwyer at gmail.com
> <arthur.j.odwyer at gmail.com>]
> *Sent:* Tuesday, August 20, 2019 12:20 PM
> *To:* Robinson, Paul
> *Cc:* Jon Chesterfield; Clang Dev; John McFarlane
> *Subject:* Re: [cfe-dev] Varying per function optimisation based on
> include path?
>
>
>
> On Tue, Aug 20, 2019 at 9:42 AM via cfe-dev <cfe-dev at lists.llvm.org>
> wrote:
>
> > In -Og mode, it seems that it would equally make sense to take "a very
> big
> > slice around system headers specifically to avoid" debug symbols for code
> > that users can't debug.
>
>
>
> Our users seem to like to be able to dump their STL containers, which
> definitely requires debug symbols for "code they can't debug."
>
>
>
> Hmm, I may have muddled things up by mentioning "debug symbols" without
> fully understanding what people mean by that phrase precisely. I meant
> "line-by-line debugging information enabling single-step through a bunch of
> templates that the user doesn't care about and would prefer to see inlined
> away." Forget debug symbols and focus on inlining, if that'll help avoid my
> confusion. :)
>
>
>
> OTOH being able to more aggressively optimize system-header code even in
> –Og mode seems reasonable.
>
> OTOOH most of the system-header code is templates or otherwise inlineable
> early, and after inlining the distinction between app and sys code really
> goes away.
>
>
>
> I believe we'd like to get "inlining early," but the problem is that `-Og`
> disables inlining. So there is no "after inlining" at the moment.
>
> Here's a very concrete example: https://godbolt.org/z/5tTgO4
> <https://urldefense.com/v3/__https:/godbolt.org/z/5tTgO4__;!fqWJcnlTkjM!7ZGRlXoS3ERcBoHUI0twkSwgjy1q68aYJaN5WYHvdmN5-ryxMXzEwmUQRCfC$>
>
>
>
> int foo(std::tuple<int, int> t) {
>
>     return std::get<0>(t);
>
> }
>
>
>
> At `-Og` this produces the assembly code
>
>
>
> _Z3fooSt5tupleIJiiEE:
>
>   pushq %rax
>
>   callq
> _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_
>
>   movl (%rax), %eax
>
>   popq %rcx
>
>   retq
>
> _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:
>
>   jmp _ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE
>
> _ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE:
>
>   jmp _ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_
>
> _ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_:
>
>   addq $4, %rdi
>
>   jmp _ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_
>
> _ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_:
>
>   movq %rdi, %rax
>
>   retq
>
>
>
> I believe that if John McFarlane's proposal were adopted by Clang, so that
> inlining-into-system-functions were allowed at `-Og`, then the resulting
> assembly code would look like this instead, for a much better experience in
> both debugging and runtime performance:
>
>
>
> _Z3fooSt5tupleIJiiEE:
>
>   pushq %rax
>
>   callq
> _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_
>
>   movl (%rax), %eax
>
>   popq %rcx
>
>   retq
>
> _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:
>
>   leaq 4(%rdi), %rax
>
>   retq
>
>
>
> Notice that we still aren't inlining `std::get` into `foo`, because `foo`
> (as a user function) gets no inlining optimizations at `-Og`. But we do
> inline and collapse the whole chain of function-template helpers into
> `std::get` (because `std::get` is a function *defined* in a system
> header). This inlining creates new optimization opportunities, such as
> combining the `add` and `mov` into a single `lea`.
>
>
>
> HTH,
>
> –Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190821/f2b9b77e/attachment.html>