[libcxx-commits] [libcxx] [libc++] Mark __{emplace, push}_back_slow_path as noinline (PR #94379)

Tue Jun 4 11:05:17 PDT 2024

EricWF wrote:

My tendency here is to trust the compiler. The example output on the code presented in the bug may seem silly, but why wouldn't the compiler inline there? It's the only function in a translation unit, and it contains a couple of lines.  I wonder under what circumstances the compiler chooses to not inline the code. 

However, when we prevent the compiler from inlining, we prevent it from optimizing away dead code.

Before you change, this silly example is optimized away.

```c++
void foo() {
    std::vector<int> v;
    v.push_back(42);
}
// foo(): # @foo()
//   ret
```

Now it generates 
```asm
foo(): # @foo()
  push rbx
  sub rsp, 48
  xorps xmm0, xmm0
  movaps xmmword ptr [rsp + 16], xmm0
  mov qword ptr [rsp + 32], 0
  mov dword ptr [rsp + 12], 42
  lea rdi, [rsp + 16]
  lea rsi, [rsp + 12]
  call int* std::__2::vector<int, std::__2::allocator<int> >::__push_back_slow_path<int>(int&&)
  mov rdi, qword ptr [rsp + 16]
  test rdi, rdi
  je .LBB0_3
  mov qword ptr [rsp + 24], rdi
  call operator delete(void*)@PLT
.LBB0_3:
  add rsp, 48
  pop rbx
  ret
  mov rbx, rax
  mov rdi, qword ptr [rsp + 16]
  test rdi, rdi
  je .LBB0_6
  mov qword ptr [rsp + 24], rdi
  call operator delete(void*)@PLT
.LBB0_6:
  mov rdi, rbx
  call _Unwind_Resume at PLT
.L.str:
  .asciz "vector"
```

Use your discretion here. Forbidding the optimizer from inlining the slow path may be a good choice. 
Here's a couple of observations:

 * libstdc++ produces the same output roughly.
 * We explicitly add `inline` to `__push_back_slow_path`, which here only acts an an inline hint to the compiler. We should 100% remove that, if we do then:
 * Compiling with -Os gives the user the codegen they want, without affecting `-O2/-O3`

https://github.com/llvm/llvm-project/pull/94379