[libcxx-commits] [PATCH] D152208: [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer
Nikolas Klauser via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Jun 26 11:07:36 PDT 2023
philnik added inline comments.
================
Comment at: libcxx/include/__memory/uninitialized_buffer.h:71-73
+#else
+ _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp)), nothrow));
+#endif
----------------
mingmingl wrote:
> philnik wrote:
> > mingmingl wrote:
> > > Hi, a regression is observed in a benchmark that uses `std::stable_sort` intensively.
> > >
> > > Comparing baseline and the regressed case, the baseline (using `get_temporary_buffer`) uses unaligned alloc ( `void* operator new[](size_t size) `) when objects are not over aligned, and the regressed uses aligned alloc for all objects. This may increase the memory usage as well.
> > >
> > > Given `std::stable_sort` is pretty important, I wonder if this regression warrants a fix to have consistent usage of (aligned or unaligned) new operator?
> > I don't understand how using the aligned alloc would increase memory usage. `void* operator new[](size_t size)` is basically `void* operator new[](size_t size, align_val_t{__STDCPP_DEFAULT_NEW_ALIGNMENT__})`. So unless you have a bad implementation it shouldn't make any difference.
> thanks for the quick response!
>
> To share more context, the malloc implementation is tcmalloc (https://github.com/google/tcmalloc); and aligned alloc exercises more instructions than unaligned one. Considering the affected library is `stable_sort`, the benefit of keeping unaligned alloc is not small.
>
> - In the baseline, unaligned alloc uses `void* operator new[](size_t size) noexcept(false)` corresponds to `TCMallocInternalNewArray` (an alias of `TCMallocInternalNew`), the implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1134
>
> - In the regressed case, `operator new[](unsigned long, std::align_val_t, std::nothrow_t const&)` is used -> `TCMallocInternalNewArrayAlignedNothrow` is used. The implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1195
>
> Comparing the `llvm-objdump` output below, aligned operator new has a longer instruction sequence (0x154 for regressed case, 0xec for baseline) -> this increases the number of instructions executed on a hot path (`std::stable_sort`).
>
>
>
> ```
> 000000000b4702c0 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new # new operator used in BASELINE
> 000000000b4703c0 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned
> 000000000b470540 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold
> 000000000b470740 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold
> 000000000b470a00 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new_nothrow
> 000000000b470b00 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned_nothrow # new operator in EXPERIMENT
> 000000000b470c80 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold_nothrow
> 000000000b470e80 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold_nothrow
> ```
How exactly does that correlate to memory use? I have no idea how TCMalloc is implemented, so your links don't really help me. It could just be the case that the aligned path isn't as well optimized as the non-aligned one, resulting in slightly worse performance. I don't see any theoretical reason the aligned path should be any slower than the non-aligned one. Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D152208/new/
https://reviews.llvm.org/D152208
More information about the libcxx-commits
mailing list