[all-commits] [llvm/llvm-project] 6fe4e0: [libc++] Optimize vector push_back to avoid contin...

Martijn Vels via All-commits all-commits at lists.llvm.org
Mon Oct 2 06:13:08 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6fe4e033f07d332980e1997c19fe705cff9d07a4
      https://github.com/llvm/llvm-project/commit/6fe4e033f07d332980e1997c19fe705cff9d07a4
  Author: Martijn Vels <mvels at google.com>
  Date:   2023-10-02 (Mon, 02 Oct 2023)

  Changed paths:
    M libcxx/benchmarks/ContainerBenchmarks.h
    M libcxx/benchmarks/vector_operations.bench.cpp
    M libcxx/include/vector

  Log Message:
  -----------
  [libc++] Optimize vector push_back to avoid continuous load and store of end pointer

Credits: this change is based on analysis and a proof of concept by
gerbens at google.com.

Before, the compiler loses track of end as 'this' and other references
possibly escape beyond the compiler's scope. This can be see in the
generated assembly:

     16.28 │200c80:   mov     %r15d,(%rax)
     60.87 │200c83:   add     $0x4,%rax
           │200c87:   mov     %rax,-0x38(%rbp)
      0.03 │200c8b: → jmpq    200d4e
      ...
      ...
      1.69 │200d4e:   cmp     %r15d,%r12d
           │200d51: → je      200c40
     16.34 │200d57:   inc     %r15d
      0.05 │200d5a:   mov     -0x38(%rbp),%rax
      3.27 │200d5e:   mov     -0x30(%rbp),%r13
      1.47 │200d62:   cmp     %r13,%rax
           │200d65: → jne     200c80

We fix this by always explicitly storing the loaded local and pointer
back at the end of push back. This generates some slight source 'noise',
but creates nice and compact fast path code, i.e.:

     32.64 │200760:   mov    %r14d,(%r12)
      9.97 │200764:   add    $0x4,%r12
      6.97 │200768:   mov    %r12,-0x38(%rbp)
     32.17 │20076c:   add    $0x1,%r14d
      2.36 │200770:   cmp    %r14d,%ebx
           │200773: → je     200730
      8.98 │200775:   mov    -0x30(%rbp),%r13
      6.75 │200779:   cmp    %r13,%r12
           │20077c: → jne    200760

Now there is a single store for the push_back value (as before), and a
single store for the end without a reload (dependency).

For fully local vectors, (i.e., not referenced elsewhere), the capacity
load and store inside the loop could also be removed, but this requires
more substantial refactoring inside vector.

Differential Revision: https://reviews.llvm.org/D80588




More information about the All-commits mailing list