[llvm-bugs] [Bug 43562] New: Speed degradation because of inlining a register clobbering function

Fri Oct 4 04:35:09 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=43562

            Bug ID: 43562
           Summary: Speed degradation because of inlining a register
                    clobbering function
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: antoshkka at gmail.com
                CC: llvm-bugs at lists.llvm.org, neeilans at live.com,
                    richard-llvm at metafoo.co.uk

Consider the example that is a simplified version of
boost::container::small_vector:

#define MAKE_INLINING_BAD 1

struct vector {
    int* data_;
    int* capacity_;
    int* size_;

    void push_back(int v) {
        if (capacity_ > size_) {
            *size_ = v;
            ++size_;
        } else {
            reallocate_and_push(v);
        }
    }

    void reallocate_and_push(int v)
#if MAKE_INLINING_BAD
    {
        // Just some code that clobbers many registers.
        // You may skip reading it
        const auto old_cap = capacity_ - data_; 
        const auto old_size = capacity_ - size_; 
        const auto new_cap = old_cap * 2 + 1;

        auto new_data_1 = new int[new_cap];
        auto new_data = new_data_1;
        for (int* old_data = data_; old_data != size_; ++old_data, ++new_data)
{
            *new_data = *old_data;
        }

        delete[] data_;
        data_ = new_data_1;
        size_ = new_data_1 + old_size;
        capacity_ = new_data_1 + new_cap;

        *size_ = v;
        ++size_;
    }
#else
    ;
#endif
};

void bad_inlining(vector& v) {
    v.push_back(42);
}

With `#define MAKE_INLINING_BAD 0` the generated code is quite good:
bad_inlining(vector&): # @bad_inlining(vector&)
  mov rax, qword ptr [rdi + 16]
  cmp qword ptr [rdi + 8], rax
  jbe .LBB0_2
  mov dword ptr [rax], 42
  add rax, 4
  mov qword ptr [rdi + 16], rax
  ret
.LBB0_2:
  mov esi, 42
  jmp vector::reallocate_and_push(int) # TAILCALL

However, with `#define MAKE_INLINING_BAD 1` the compiler decides to inline the
`reallocate_and_push` function that clobbers many registers. So the compiler
stores the values of those registers on the stack before doing the cmompare and
ja:

bad_inlining(vector&): # @bad_inlining(vector&)
  push rbp     ; don't need those pushes for the `(capacity_ > size_)` case
  push r15
  push r14
  push r13
  push r12
  push rbx
  push rax
  mov r14, rdi
  mov r15, qword ptr [rdi + 8]
  mov r13, qword ptr [rdi + 16]
  mov rbp, r15
  sub rbp, r13
  ja .LBB0_14  ; hot path that does not clobbers registers
  ; vector::reallocate_and_push(int) implementation
  add rsp, 8
  pop rbx      ; don't need those pops for the `(capacity_ > size_)` case
  pop r12
  pop r13
  pop r14
  pop r15
  pop rbp
  ret

This greatly degrades the performance of the first branch (more than x3
degradation in real code).

The possible fix would be to place all the push/pop operations near the inlined
`reallocate_and_push`:

bad_inlining(vector&):
  mov rax, qword ptr [rdi + 16]
  cmp qword ptr [rdi + 8], rax
  jbe .LBB0_2
  mov dword ptr [rax], 42
  add rax, 4
  mov qword ptr [rdi + 16], rax
  ret
.LBB0_2:
  push rbp
  push r15
  push r14
  ; ...
  ; vector::reallocate_and_push(int) implementation goes here
  ; ...
  pop r14
  pop r15
  pop rbp
  ret

Godbolt playground: https://godbolt.org/z/zM9bR0
Related GCC issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191004/251e23cc/attachment.html>