[libcxx-commits] [PATCH] D152208: [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer
Mingming Liu via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Jun 26 10:55:52 PDT 2023
mingmingl added inline comments.
================
Comment at: libcxx/include/__memory/uninitialized_buffer.h:71-73
+#else
+ _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp)), nothrow));
+#endif
----------------
philnik wrote:
> mingmingl wrote:
> > Hi, a regression is observed in a benchmark that uses `std::stable_sort` intensively.
> >
> > Comparing baseline and the regressed case, the baseline (using `get_temporary_buffer`) uses unaligned alloc ( `void* operator new[](size_t size) `) when objects are not over aligned, and the regressed uses aligned alloc for all objects. This may increase the memory usage as well.
> >
> > Given `std::stable_sort` is pretty important, I wonder if this regression warrants a fix to have consistent usage of (aligned or unaligned) new operator?
> I don't understand how using the aligned alloc would increase memory usage. `void* operator new[](size_t size)` is basically `void* operator new[](size_t size, align_val_t{__STDCPP_DEFAULT_NEW_ALIGNMENT__})`. So unless you have a bad implementation it shouldn't make any difference.
thanks for the quick response!
To share more context, the malloc implementation is tcmalloc (https://github.com/google/tcmalloc); and aligned alloc exercises more instructions than unaligned one. Considering the affected library is `stable_sort`, the benefit of keeping unaligned alloc is not small.
- In the baseline, unaligned alloc uses `void* operator new[](size_t size) noexcept(false)` corresponds to `TCMallocInternalNewArray` (an alias of `TCMallocInternalNew`), the implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1134
- In the regressed case, `operator new[](unsigned long, std::align_val_t, std::nothrow_t const&)` is used -> `TCMallocInternalNewArrayAlignedNothrow` is used. The implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1195
Comparing the `llvm-objdump` output below, aligned operator new has a longer instruction sequence (0x154 for regressed case, 0xec for baseline) -> this increases the number of instructions executed on a hot path (`std::stable_sort`).
```
000000000b4702c0 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new # new operator used in BASELINE
000000000b4703c0 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned
000000000b470540 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold
000000000b470740 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold
000000000b470a00 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new_nothrow
000000000b470b00 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned_nothrow # new operator in EXPERIMENT
000000000b470c80 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold_nothrow
000000000b470e80 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold_nothrow
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D152208/new/
https://reviews.llvm.org/D152208
More information about the libcxx-commits
mailing list