[cfe-dev] Varying per function optimisation based on include path?
Arthur O'Dwyer via cfe-dev
cfe-dev at lists.llvm.org
Tue Aug 20 09:20:25 PDT 2019
On Tue, Aug 20, 2019 at 9:42 AM via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> > In -Og mode, it seems that it would equally make sense to take "a very
> big
> > slice around system headers specifically to avoid" debug symbols for code
> > that users can't debug.
>
>
>
> Our users seem to like to be able to dump their STL containers, which
> definitely requires debug symbols for "code they can't debug."
>
Hmm, I may have muddled things up by mentioning "debug symbols" without
fully understanding what people mean by that phrase precisely. I meant
"line-by-line debugging information enabling single-step through a bunch of
templates that the user doesn't care about and would prefer to see inlined
away." Forget debug symbols and focus on inlining, if that'll help avoid my
confusion. :)
> OTOH being able to more aggressively optimize system-header code even in
> –Og mode seems reasonable.
>
> OTOOH most of the system-header code is templates or otherwise inlineable
> early, and after inlining the distinction between app and sys code really
> goes away.
>
>
I believe we'd like to get "inlining early," but the problem is that `-Og`
disables inlining. So there is no "after inlining" at the moment.
Here's a very concrete example: https://godbolt.org/z/5tTgO4
int foo(std::tuple<int, int> t) {
return std::get<0>(t);
}
At `-Og` this produces the assembly code
_Z3fooSt5tupleIJiiEE:
pushq %rax
callq
_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_
movl (%rax), %eax
popq %rcx
retq
_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:
jmp _ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE
_ZSt12__get_helperILm0EiJiEERT0_RSt11_Tuple_implIXT_EJS0_DpT1_EE:
jmp _ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_
_ZNSt11_Tuple_implILm0EJiiEE7_M_headERS0_:
addq $4, %rdi
jmp _ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_
_ZNSt10_Head_baseILm0EiLb0EE7_M_headERS0_:
movq %rdi, %rax
retq
I believe that if John McFarlane's proposal were adopted by Clang, so that
inlining-into-system-functions were allowed at `-Og`, then the resulting
assembly code would look like this instead, for a much better experience in
both debugging and runtime performance:
_Z3fooSt5tupleIJiiEE:
pushq %rax
callq
_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_
movl (%rax), %eax
popq %rcx
retq
_ZSt3getILm0EJiiEERNSt13tuple_elementIXT_ESt5tupleIJDpT0_EEE4typeERS4_:
leaq 4(%rdi), %rax
retq
Notice that we still aren't inlining `std::get` into `foo`, because `foo`
(as a user function) gets no inlining optimizations at `-Og`. But we do
inline and collapse the whole chain of function-template helpers into
`std::get` (because `std::get` is a function *defined* in a system header).
This inlining creates new optimization opportunities, such as combining the
`add` and `mov` into a single `lea`.
HTH,
–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190820/5f18b10d/attachment.html>
More information about the cfe-dev
mailing list