[libcxx-commits] [PATCH] D152208: [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer

Mingming Liu via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Jun 29 17:52:39 PDT 2023


mingmingl added a comment.

In D152208#4460803 <https://reviews.llvm.org/D152208#4460803>, @davidxl wrote:

> In D152208#4460003 <https://reviews.llvm.org/D152208#4460003>, @ldionne wrote:
>
>> In D152208#4459623 <https://reviews.llvm.org/D152208#4459623>, @ldionne wrote:
>>
>>> IMO I agree that if we were using aligned allocation and we're not anymore, this is simply something that went unnoticed in this patch and we should fix it.
>>>
>>> Should `stable_sort` be dominated by an allocation? Probably not, that seems a bit crazy. But nonetheless it seems like we did introduce an unintended difference in this patch and it should be pretty easy to fix. Let's follow up on D152208 <https://reviews.llvm.org/D152208>.
>>
>> Looks like I read this issue the wrong way around. So we were doing unaligned allocation before, and now we're using aligned allocation (which should be more correct, no?) and it's slower. That's a lot more curious, and I think it's worth investigating why this caused a regression because there might be some sort of perf issue in how your allocator deals with aligned allocation (or maybe on the libc++ side). Anyway, let's follow-up in D152208 <https://reviews.llvm.org/D152208> to avoid forking the discussion too much.
>
> As I mentioned in the previous reply, the performance regression comes from addition runtime cost (allocation) in the context of __inplace_merge call.  A little more debugging shows that in the old version, the allocation can be completely elided when __buf_size is 0 (see get_temporary_buffer code), but make_uninitialized_buffer will try to invoke operator new regardless.
>
> // inplace_merge code
>
>   difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
>   difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
>   difference_type __buf_size = _VSTD::min(__len1, __len2);

thanks for debugging this! I wish I continued to get a detailed call-graph when seeing the common caller instruction cycle increases rather than guessing it's stable_sort :(


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152208/new/

https://reviews.llvm.org/D152208



More information about the libcxx-commits mailing list