[libcxx-commits] [PATCH] D152208: [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer

Nikolas Klauser via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Mon Jun 26 11:07:36 PDT 2023


philnik added inline comments.


================
Comment at: libcxx/include/__memory/uninitialized_buffer.h:71-73
+#else
+  _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp)), nothrow));
+#endif
----------------
mingmingl wrote:
> philnik wrote:
> > mingmingl wrote:
> > > Hi, a regression is observed in a benchmark that uses `std::stable_sort` intensively. 
> > > 
> > > Comparing baseline and the regressed case, the baseline (using `get_temporary_buffer`) uses unaligned alloc ( `void* operator new[](size_t size) `) when objects are not over aligned, and the regressed uses aligned alloc for all objects. This may increase the memory usage as well.
> > > 
> > > Given `std::stable_sort` is pretty important, I wonder if this regression warrants a fix to have consistent usage of (aligned or unaligned) new operator?
> > I don't understand how using the aligned alloc would increase memory usage. `void* operator new[](size_t size)` is basically `void* operator new[](size_t size, align_val_t{__STDCPP_DEFAULT_NEW_ALIGNMENT__})`. So unless you have a bad implementation it shouldn't make any difference.
> thanks for the quick response! 
> 
> To share more context, the malloc implementation is tcmalloc (https://github.com/google/tcmalloc); and aligned alloc exercises more instructions than unaligned one. Considering the affected library is `stable_sort`, the benefit of keeping unaligned alloc is not small.
> 
> - In the baseline, unaligned alloc uses `void* operator new[](size_t size) noexcept(false)` corresponds to `TCMallocInternalNewArray` (an alias of `TCMallocInternalNew`), the implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1134
> 
> - In the regressed case, `operator new[](unsigned long, std::align_val_t, std::nothrow_t const&)` is used -> `TCMallocInternalNewArrayAlignedNothrow` is used. The implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1195
> 
> Comparing the `llvm-objdump` output below, aligned operator new has a longer instruction sequence (0x154 for regressed case, 0xec for baseline) -> this increases the number of instructions executed on a hot path (`std::stable_sort`).
> 
> 
> 
> ```
> 000000000b4702c0 g     F section-name	00000000000000ec tcmalloc_size_returning_operator_new   # new operator used in BASELINE
> 000000000b4703c0 g     F section-name	0000000000000154 tcmalloc_size_returning_operator_new_aligned
> 000000000b470540 g     F section-name	00000000000001ec tcmalloc_size_returning_operator_new_hot_cold
> 000000000b470740 g     F section-name	00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold
> 000000000b470a00 g     F section-name 00000000000000ec tcmalloc_size_returning_operator_new_nothrow
> 000000000b470b00 g     F section-name	0000000000000154 tcmalloc_size_returning_operator_new_aligned_nothrow # new operator in EXPERIMENT
> 000000000b470c80 g     F section-name	00000000000001ec tcmalloc_size_returning_operator_new_hot_cold_nothrow
> 000000000b470e80 g     F section-name	00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold_nothrow
> ```
How exactly does that correlate to memory use? I have no idea how TCMalloc is implemented, so your links don't really help me. It could just be the case that the aligned path isn't as well optimized as the non-aligned one, resulting in slightly worse performance. I don't see any theoretical reason the aligned path should be any slower than the non-aligned one. Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152208/new/

https://reviews.llvm.org/D152208



More information about the libcxx-commits mailing list