[libcxx-commits] [PATCH] D125958: [libc++] Use __libcpp_clz for a tighter __log2i function
Hans Wennborg via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Tue May 24 07:27:37 PDT 2022
hans marked 4 inline comments as done.
hans added a comment.
Here are some benchmark numbers (requires D126297 <https://reviews.llvm.org/D126297>). I ran `build.release/projects/libcxx/benchmarks/sort.libcxx.out --benchmark_filter=BM_Sort_uint32_QuickSortAdversary*`.
Before my patch:
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1 4.51 ns 4.51 ns 154664960
BM_Sort_uint32_QuickSortAdversary_4 1.97 ns 1.97 ns 357040128
BM_Sort_uint32_QuickSortAdversary_16 0.941 ns 0.941 ns 742129664
BM_Sort_uint32_QuickSortAdversary_64 11.0 ns 11.0 ns 63176704
BM_Sort_uint32_QuickSortAdversary_256 18.1 ns 18.1 ns 38535168
BM_Sort_uint32_QuickSortAdversary_1024 33.1 ns 33.1 ns 21233664
BM_Sort_uint32_QuickSortAdversary_16384 49.4 ns 49.4 ns 14155776
BM_Sort_uint32_QuickSortAdversary_262144 59.9 ns 59.9 ns 11796480
After my patch:
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1 4.81 ns 4.81 ns 145227776
BM_Sort_uint32_QuickSortAdversary_4 1.82 ns 1.82 ns 379322368
BM_Sort_uint32_QuickSortAdversary_16 0.849 ns 0.849 ns 820772864
BM_Sort_uint32_QuickSortAdversary_64 11.6 ns 11.6 ns 60293120
BM_Sort_uint32_QuickSortAdversary_256 19.5 ns 19.5 ns 36175872
BM_Sort_uint32_QuickSortAdversary_1024 36.0 ns 36.0 ns 19660800
BM_Sort_uint32_QuickSortAdversary_16384 53.2 ns 53.2 ns 13369344
BM_Sort_uint32_QuickSortAdversary_262144 63.7 ns 63.7 ns 11272192
Looks like some numbers are up and some are down. I'd say this is perf neutral.
But it does have a small effect on size. The text of `sort.libcxx.out` goes from 509310 to 508542 bytes, so 768 bytes saved.
I also tried this on a "release" build of Chrome (without ThinLTO or PGO, so doesn't exactly match what we ship), and the text went from 259288968 to 259282312 bytes, so 6.5 KB saved.
================
Comment at: libcxx/include/__algorithm/sort.h:505
+ if (sizeof(__n) <= sizeof(unsigned))
+ return sizeof(unsigned) * __CHAR_BIT__ - 1 - __libcpp_clz(static_cast<unsigned>(__n));
+ if (sizeof(__n) <= sizeof(unsigned long))
----------------
philnik wrote:
> nilayvaish wrote:
> > I am wondering whether __CHAR_BIT__ is preferred over using CHAR_BIT (available from climit header file). I don't know what the convention is for writing library code.
> I think `CHAR_BIT` is the preferred way. At least that's what I'm using.
It seems both are used roughly as often.
```
$ git grep -In [^_]CHAR_BIT -- libcxx/include | wc -l
25
$ git grep -In __CHAR_BIT__ -- libcxx/include | wc -l
22
```
Given that `__CHAR_BIT__` avoids potentially pushing another header include on every user of `<sort>`, I'd prefer to stay with it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125958/new/
https://reviews.llvm.org/D125958
More information about the libcxx-commits
mailing list