[libcxx-commits] [PATCH] D125958: [libc++] Use __libcpp_clz for a tighter __log2i function

Tue May 24 07:27:37 PDT 2022

hans marked 4 inline comments as done.
hans added a comment.

Here are some benchmark numbers (requires D126297 <https://reviews.llvm.org/D126297>). I ran `build.release/projects/libcxx/benchmarks/sort.libcxx.out --benchmark_filter=BM_Sort_uint32_QuickSortAdversary*`.

Before my patch:

  -----------------------------------------------------------------------------------
  Benchmark                                         Time             CPU   Iterations
  -----------------------------------------------------------------------------------
  BM_Sort_uint32_QuickSortAdversary_1            4.51 ns         4.51 ns    154664960
  BM_Sort_uint32_QuickSortAdversary_4            1.97 ns         1.97 ns    357040128
  BM_Sort_uint32_QuickSortAdversary_16          0.941 ns        0.941 ns    742129664
  BM_Sort_uint32_QuickSortAdversary_64           11.0 ns         11.0 ns     63176704
  BM_Sort_uint32_QuickSortAdversary_256          18.1 ns         18.1 ns     38535168
  BM_Sort_uint32_QuickSortAdversary_1024         33.1 ns         33.1 ns     21233664
  BM_Sort_uint32_QuickSortAdversary_16384        49.4 ns         49.4 ns     14155776
  BM_Sort_uint32_QuickSortAdversary_262144       59.9 ns         59.9 ns     11796480

After my patch:

  -----------------------------------------------------------------------------------
  Benchmark                                         Time             CPU   Iterations
  -----------------------------------------------------------------------------------
  BM_Sort_uint32_QuickSortAdversary_1            4.81 ns         4.81 ns    145227776
  BM_Sort_uint32_QuickSortAdversary_4            1.82 ns         1.82 ns    379322368
  BM_Sort_uint32_QuickSortAdversary_16          0.849 ns        0.849 ns    820772864
  BM_Sort_uint32_QuickSortAdversary_64           11.6 ns         11.6 ns     60293120
  BM_Sort_uint32_QuickSortAdversary_256          19.5 ns         19.5 ns     36175872
  BM_Sort_uint32_QuickSortAdversary_1024         36.0 ns         36.0 ns     19660800
  BM_Sort_uint32_QuickSortAdversary_16384        53.2 ns         53.2 ns     13369344
  BM_Sort_uint32_QuickSortAdversary_262144       63.7 ns         63.7 ns     11272192

Looks like some numbers are up and some are down. I'd say this is perf neutral.

But it does have a small effect on size. The text of `sort.libcxx.out` goes from 509310 to 508542 bytes, so 768 bytes saved.

I also tried this on a "release" build of Chrome (without ThinLTO or PGO, so doesn't exactly match what we ship), and the text went from 259288968 to 259282312 bytes, so 6.5 KB saved.

================
Comment at: libcxx/include/__algorithm/sort.h:505
+  if (sizeof(__n) <= sizeof(unsigned))
+    return sizeof(unsigned) * __CHAR_BIT__ - 1 - __libcpp_clz(static_cast<unsigned>(__n));
+  if (sizeof(__n) <= sizeof(unsigned long))
----------------
philnik wrote:
> nilayvaish wrote:
> > I am wondering whether __CHAR_BIT__ is preferred over using CHAR_BIT (available from climit header file).  I don't know what the convention is for writing library code.
> I think `CHAR_BIT` is the preferred way. At least that's what I'm using.
It seems both are used roughly as often.

```
$ git grep -In [^_]CHAR_BIT -- libcxx/include | wc -l
25
$ git grep -In __CHAR_BIT__ -- libcxx/include | wc -l
22
```

Given that `__CHAR_BIT__` avoids potentially pushing another header include on every user of `<sort>`, I'd prefer to stay with it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125958/new/

https://reviews.llvm.org/D125958