[cfe-dev] Varying per function optimisation based on include path?

Tue Aug 20 11:34:48 PDT 2019

I think a question was glossed over.  Exactly which directions should be inlined…

  1.  User callee into user caller (definitely not)
  2.  System callee into system caller (yes)
  3.  User callee into system caller (maybe?)
  4.  System callee into user caller (maybe?)

Perhaps number 3 should be prohibited because then a breakpoint on “my” function would either not get hit, or turn into multiple breakpoints.

Perhaps number 4 should be prohibited because it makes stepping across loop iterations in things like std::transform more difficult.

From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of via cfe-dev
Sent: Tuesday, August 20, 2019 12:59 PM
To: arthur.j.odwyer at gmail.com
Cc: jonathanchesterfield at gmail.com; john at mcfarlane.name; cfe-dev at lists.llvm.org
Subject: [EXTERNAL] Re: [cfe-dev] Varying per function optimisation based on include path?

Ah, I'd forgotten that Og prefers not to inline.
Distinguishing optimization levels within one translation unit is tricky given the current way we build optimization pipelines. They are *not* designed to handle function-level differences in optimization levels.  Trying to (essentially) mix O1 and O2 in the same translation unit is a radical departure from how LLVM thinks about optimization.  ('optnone' is a special case where passes effectively disable themselves when presented with an 'optnone' function. Generalizing that to more optimization levels is a seriously invasive proposition.)

Re the "symbols" confusion, broadly speaking you can separate debug info into that which describes the source (types, variables, etc), and that which describes the generated code (to a first approximation, the instruction<->source mapping).  So the suggestion in this thread is to retain the former but not the latter.
In this exercise, if we genuinely want to *prevent* debugging of defined-in-system-header functions (which seems like a highly questionable feature) it could be done with judicious application of the 'nodebug' attribute.  Not hard, really.
--paulr

From: Arthur O'Dwyer [mailto:arthur.j.odwyer at gmail.com]
Sent: Tuesday, August 20, 2019 12:20 PM
To: Robinson, Paul
Cc: Jon Chesterfield; Clang Dev; John McFarlane
Subject: Re: [cfe-dev] Varying per function optimisation based on include path?

On Tue, Aug 20, 2019 at 9:42 AM via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
> In -Og mode, it seems that it would equally make sense to take "a very big
> slice around system headers specifically to avoid" debug symbols for code
> that users can't debug.

Our users seem to like to be able to dump their STL containers, which definitely requires debug symbols for "code they can't debug."

Hmm, I may have muddled things up by mentioning "debug symbols" without fully understanding what people mean by that phrase precisely. I meant "line-by-line debugging information enabling single-step through a bunch of templates that the user doesn't care about and would prefer to see inlined away." Forget debug symbols and focus on inlining, if that'll help avoid my confusion. :)

OTOH being able to more aggressively optimize system-header code even in –Og mode seems reasonable.
OTOOH most of the system-header code is templates or otherwise inlineable early, and after inlining the distinction between app and sys code really goes away.

I believe we'd like to get "inlining early," but the problem is that `-Og` disables inlining. So there is no "after inlining" at the moment.
Here's a very concrete example: https://godbolt.org/z/5tTgO4<https://urldefense.com/v3/__https:/godbolt.org/z/5tTgO4__;!fqWJcnlTkjM!7ZGRlXoS3ERcBoHUI0twkSwgjy1q68aYJaN5WYHvdmN5-ryxMXzEwmUQRCfC$>

int foo(std::tuple<int, int> t) {

    return std::get<0>(t);

}

At `-Og` this produces the assembly code

_Z3fooSt5tupleIJiiEE:

  pushq %rax

  callq _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_

  movl (%rax), %eax

  popq %rcx

  retq

_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:

  jmp _ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE

_ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE:

  jmp _ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_

_ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_:

  addq $4, %rdi

  jmp _ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_

_ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_:

  movq %rdi, %rax

  retq

I believe that if John McFarlane's proposal were adopted by Clang, so that inlining-into-system-functions were allowed at `-Og`, then the resulting assembly code would look like this instead, for a much better experience in both debugging and runtime performance:

_Z3fooSt5tupleIJiiEE:

  pushq %rax

  callq _ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_

  movl (%rax), %eax

  popq %rcx

  retq

_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:

  leaq 4(%rdi), %rax

  retq

Notice that we still aren't inlining `std::get` into `foo`, because `foo` (as a user function) gets no inlining optimizations at `-Og`. But we do inline and collapse the whole chain of function-template helpers into `std::get` (because `std::get` is a function defined in a system header). This inlining creates new optimization opportunities, such as combining the `add` and `mov` into a single `lea`.

HTH,
–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190820/e1f1b42a/attachment.html>