[libcxx-commits] [PATCH] D152208: [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer

Tue Jun 27 14:10:39 PDT 2023

davidxl added a comment.

In D152208#4453925 <https://reviews.llvm.org/D152208#4453925>, @philnik wrote:

> In D152208#4452378 <https://reviews.llvm.org/D152208#4452378>, @hans wrote:
>
>> In D152208#4448532 <https://reviews.llvm.org/D152208#4448532>, @hans wrote:
>>
>>> We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523
>>
>> Friendly ping to make sure this doesn't get lost in the other discussion.
>
> I'm working on adding a CFI run to our CI right now. It looks like I just have to add `_LIBCPP_NO_CFI` to `__make_uninitialized_buffer`, given that `get_temporary_buffer` also has that attribute. I hope to get the patch completed by tomorrow.
>
> In D152208#4453582 <https://reviews.llvm.org/D152208#4453582>, @davidxl wrote:
>
>> Should we partially revert this change? Instead of making use of the new interface for stable_sort, the old  get_temporary_buffer can be used (as an internal API), until the performance parity is reached?
>
> I don't see why. If the allocation is actually so costly that it dominates `stable_sort`, you should either look into whether you actually need `stable_sort` for your workload or look into getting a better allocator (TCMalloc is probably not bad, so I suspect that's not the problem). Sorting doesn't seem like a task where a single allocation should ever dominate. I'd also like to see the actual use case where a single branch is so problematic that it warrants a revert before doing so. I don't even know how much of an impact this patch actually is in your specific use-case.

My understanding is that problem is caused by the more aggressive use of aligned allocation with this patch (not due to increased memory usage). The old interface has special code (additional branch) to check the overalignment and specialize based on that. Note those check are compile time constant and will be optimized away.  In that regard, the new implementation misses out the optimization (specialization) so I believe it is a regression.

The problem happens to be caught by using tcmalloc because it has a very well tuned code path for unaligned case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152208/new/

https://reviews.llvm.org/D152208