[libcxx-commits] [libcxx] [libc++] Optimize bitset::to_string (PR #128832)
via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Feb 26 05:27:07 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-libcxx
Author: Peng Liu (winner245)
<details>
<summary>Changes</summary>
This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process, especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing implementation.
Benchmarks demonstrate substantial improvements, achieving up to **6.6x** speedup for sparse `bitset`s, **10.4x** for dense `bitset`s, and **1.8x** for uniformly distributed `bitset`s.
#### Sparse case (10% 1 bits)
```
--------------------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------------------
BM_BitsetToString<32>/Sparse (10%)/10 18.7 ns 17.7 ns 1.1x
BM_BitsetToString<64>/Sparse (10%)/10 40.5 ns 16.0 ns 2.5x
BM_BitsetToString<128>/Sparse (10%)/10 69.8 ns 22.3 ns 3.1x
BM_BitsetToString<256>/Sparse (10%)/10 129 ns 29.1 ns 4.4x
BM_BitsetToString<512>/Sparse (10%)/10 277 ns 45.9 ns 6.0x
BM_BitsetToString<1024>/Sparse (10%)/10 535 ns 112 ns 4.8x
BM_BitsetToString<2048>/Sparse (10%)/10 1004 ns 175 ns 5.7x
BM_BitsetToString<4096>/Sparse (10%)/10 1967 ns 418 ns 4.7x
BM_BitsetToString<8192>/Sparse (10%)/10 4064 ns 618 ns 6.6x
BM_BitsetToString<16384>/Sparse (10%)/10 8280 ns 1503 ns 5.5x
BM_BitsetToString<32768>/Sparse (10%)/10 15476 ns 2409 ns 6.4x
BM_BitsetToString<65536>/Sparse (10%)/10 31873 ns 6486 ns 4.9x
BM_BitsetToString<131072>/Sparse (10%)/10 64303 ns 10186 ns 6.3x
BM_BitsetToString<262144>/Sparse (10%)/10 134330 ns 25555 ns 5.3x
BM_BitsetToString<524288>/Sparse (10%)/10 253769 ns 41379 ns 6.1x
BM_BitsetToString<1048576>/Sparse (10%)/10 517276 ns 103079 ns 5.0x
```
#### Dense case (90% 1 bits)
```
--------------------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------------------
BM_BitsetToString<32>/Dense (90%)/90 25.1 ns 15.9 ns 1.6x
BM_BitsetToString<64>/Dense (90%)/90 45.8 ns 19.2 ns 2.4x
BM_BitsetToString<128>/Dense (90%)/90 96.6 ns 22.6 ns 4.3x
BM_BitsetToString<256>/Dense (90%)/90 187 ns 31.8 ns 5.9x
BM_BitsetToString<512>/Dense (90%)/90 374 ns 45.3 ns 8.3x
BM_BitsetToString<1024>/Dense (90%)/90 750 ns 89.4 ns 8.4x
BM_BitsetToString<2048>/Dense (90%)/90 1292 ns 190 ns 6.8x
BM_BitsetToString<4096>/Dense (90%)/90 2557 ns 371 ns 6.9x
BM_BitsetToString<8192>/Dense (90%)/90 5721 ns 666 ns 8.6x
BM_BitsetToString<16384>/Dense (90%)/90 11480 ns 1225 ns 9.4x
BM_BitsetToString<32768>/Dense (90%)/90 19835 ns 2557 ns 7.8x
BM_BitsetToString<65536>/Dense (90%)/90 46761 ns 5040 ns 9.3x
BM_BitsetToString<131072>/Dense (90%)/90 91796 ns 10822 ns 8.5x
BM_BitsetToString<262144>/Dense (90%)/90 185850 ns 21172 ns 8.8x
BM_BitsetToString<524288>/Dense (90%)/90 328253 ns 43810 ns 7.5x
BM_BitsetToString<1048576>/Dense (90%)/90 898541 ns 86344 ns 10.4x
```
#### Uniform case (50% 1 bits)
```
--------------------------------------------------------------------------------------
Benchmark Before After Improvement
--------------------------------------------------------------------------------------
BM_BitsetToString<32>/Uniform (50%)/50 23.7 ns 21.5 ns 1.1x
BM_BitsetToString<64>/Uniform (50%)/50 55.9 ns 40.7 ns 1.4x
BM_BitsetToString<128>/Uniform (50%)/50 87.0 ns 48.7 ns 1.8x
BM_BitsetToString<256>/Uniform (50%)/50 156 ns 120 ns 1.3x
BM_BitsetToString<512>/Uniform (50%)/50 296 ns 151 ns 2.0x
BM_BitsetToString<1024>/Uniform (50%)/50 569 ns 421 ns 1.4x
BM_BitsetToString<2048>/Uniform (50%)/50 1142 ns 903 ns 1.3x
BM_BitsetToString<4096>/Uniform (50%)/50 2211 ns 1378 ns 1.6x
BM_BitsetToString<8192>/Uniform (50%)/50 4430 ns 3619 ns 1.2x
BM_BitsetToString<16384>/Uniform (50%)/50 8871 ns 5894 ns 1.5x
BM_BitsetToString<32768>/Uniform (50%)/50 17505 ns 13420 ns 1.3x
BM_BitsetToString<65536>/Uniform (50%)/50 35055 ns 24498 ns 1.4x
BM_BitsetToString<131072>/Uniform (50%)/50 70637 ns 56697 ns 1.2x
BM_BitsetToString<262144>/Uniform (50%)/50 141838 ns 89614 ns 1.6x
BM_BitsetToString<524288>/Uniform (50%)/50 284197 ns 220883 ns 1.3x
BM_BitsetToString<1048576>/Uniform (50%)/50 569476 ns 359686 ns 1.6x
```
---
Full diff: https://github.com/llvm/llvm-project/pull/128832.diff
2 Files Affected:
- (modified) libcxx/include/bitset (+52-6)
- (added) libcxx/test/benchmarks/bitset.bench.cpp (+115)
``````````diff
diff --git a/libcxx/include/bitset b/libcxx/include/bitset
index ab1dda739c7d5..33aebeef48908 100644
--- a/libcxx/include/bitset
+++ b/libcxx/include/bitset
@@ -136,6 +136,8 @@ template <size_t N> struct hash<std::bitset<N>>;
# include <__algorithm/fill_n.h>
# include <__algorithm/find.h>
# include <__assert>
+# include <__bit/countr.h>
+# include <__bit/invert_if.h>
# include <__bit_reference>
# include <__config>
# include <__cstddef/ptrdiff_t.h>
@@ -223,6 +225,10 @@ protected:
return to_ullong(integral_constant < bool, _Size< sizeof(unsigned long long) * CHAR_BIT>());
}
+ template <bool _Spare, class _CharT, class _Traits, class _Allocator>
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
+ __to_string(_CharT __zero, _CharT __one) const;
+
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool all() const _NOEXCEPT;
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool any() const _NOEXCEPT;
_LIBCPP_HIDE_FROM_ABI size_t __hash_code() const _NOEXCEPT;
@@ -389,6 +395,22 @@ __bitset<_N_words, _Size>::to_ullong(true_type, true_type) const {
return __r;
}
+template <size_t _N_words, size_t _Size>
+template <bool _Spare, class _CharT, class _Traits, class _Allocator>
+_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
+__bitset<_N_words, _Size>::__to_string(_CharT __zero, _CharT __one) const {
+ basic_string<_CharT, _Traits, _Allocator> __r(_Size, __zero);
+ for (size_t __i = 0, __bits = 0; __i < _N_words; ++__i, __bits += __bits_per_word) {
+ __storage_type __word = std::__invert_if<!_Spare>(__first_[__i]);
+ if (__i == _N_words - 1 && _Size - __bits < __bits_per_word)
+ __word &= (__storage_type(1) << (_Size - __bits)) - 1;
+ for (; __word; __word &= (__word - 1))
+ __r[_Size - 1 - (__bits + std::__countr_zero(__word))] = __one;
+ }
+
+ return __r;
+}
+
template <size_t _N_words, size_t _Size>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool __bitset<_N_words, _Size>::all() const _NOEXCEPT {
// do middle whole words
@@ -480,6 +502,10 @@ protected:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long to_ulong() const;
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long long to_ullong() const;
+ template <bool _Sparse, class _CharT, class _Traits, class _Allocator>
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
+ __to_string(_CharT __zero, _CharT __one) const;
+
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool all() const _NOEXCEPT;
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool any() const _NOEXCEPT;
@@ -529,6 +555,21 @@ inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long long __
return __first_;
}
+template <size_t _Size>
+template <bool _Spare, class _CharT, class _Traits, class _Allocator>
+_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
+__bitset<1, _Size>::__to_string(_CharT __zero, _CharT __one) const {
+ basic_string<_CharT, _Traits, _Allocator> __r(_Size, __zero);
+ __storage_type __word = std::__invert_if<!_Spare>(__first_);
+ if (_Size < __bits_per_word)
+ __word &= (__storage_type(1) << _Size) - 1;
+ for (; __word; __word &= (__word - 1)) {
+ size_t __pos = std::__countr_zero(__word);
+ __r[_Size - 1 - __pos] = __one;
+ }
+ return __r;
+}
+
template <size_t _Size>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool __bitset<1, _Size>::all() const _NOEXCEPT {
__storage_type __m = ~__storage_type(0) >> (__bits_per_word - _Size);
@@ -593,6 +634,12 @@ protected:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long to_ulong() const { return 0; }
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long long to_ullong() const { return 0; }
+ template <bool _Spare, class _CharT, class _Traits, class _Allocator>
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
+ __to_string(_CharT, _CharT) const {
+ return basic_string<_CharT, _Traits, _Allocator>();
+ }
+
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool all() const _NOEXCEPT { return true; }
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 bool any() const _NOEXCEPT { return false; }
@@ -848,12 +895,11 @@ template <size_t _Size>
template <class _CharT, class _Traits, class _Allocator>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 basic_string<_CharT, _Traits, _Allocator>
bitset<_Size>::to_string(_CharT __zero, _CharT __one) const {
- basic_string<_CharT, _Traits, _Allocator> __r(_Size, __zero);
- for (size_t __i = 0; __i != _Size; ++__i) {
- if ((*this)[__i])
- __r[_Size - 1 - __i] = __one;
- }
- return __r;
+ bool __sparse = size_t(std::count(__base::__make_iter(0), __base::__make_iter(_Size), true)) < _Size / 2;
+ if (__sparse)
+ return __base::template __to_string<true, _CharT, _Traits, _Allocator>(__zero, __one);
+ else
+ return __base::template __to_string<false, _CharT, _Traits, _Allocator>(__one, __zero);
}
template <size_t _Size>
diff --git a/libcxx/test/benchmarks/bitset.bench.cpp b/libcxx/test/benchmarks/bitset.bench.cpp
new file mode 100644
index 0000000000000..f89ed6036b7cc
--- /dev/null
+++ b/libcxx/test/benchmarks/bitset.bench.cpp
@@ -0,0 +1,115 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03
+
+#include "benchmark/benchmark.h"
+#include <bitset>
+#include <cmath>
+#include <cstddef>
+
+template <std::size_t N>
+struct GenerateBitset {
+ // Construct a bitset with p*N true bits
+ static std::bitset<N> generate(double p) {
+ std::bitset<N> b;
+ if (p <= 0.0)
+ return b;
+ if (p >= 1.0)
+ return ~b;
+
+ std::size_t num_ones = std::round(N * p);
+ if (num_ones == 0)
+ return b;
+
+ double step = static_cast<double>(N) / num_ones;
+ double error = 0.0;
+
+ std::size_t pos = 0;
+ for (std::size_t i = 0; i < num_ones; ++i) {
+ if (pos >= N)
+ break;
+ b.set(pos);
+ error += step;
+ pos += std::floor(error);
+ error -= std::floor(error);
+ }
+ return b;
+ }
+
+ static std::bitset<N> sparse() { return generate(0.1); }
+ static std::bitset<N> dense() { return generate(0.9); }
+ static std::bitset<N> uniform() { return generate(0.5); }
+};
+
+template <std::size_t N>
+static void BM_BitsetToString(benchmark::State& state) {
+ double p = state.range(0) / 100.0;
+ std::bitset<N> b = GenerateBitset<N>::generate(p);
+ benchmark::DoNotOptimize(b);
+
+ for (auto _ : state) {
+ benchmark::DoNotOptimize(b.to_string());
+ }
+}
+
+// Sparse bitset
+BENCHMARK(BM_BitsetToString<32>)->Arg(10)->Name("BM_BitsetToString<32>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<64>)->Arg(10)->Name("BM_BitsetToString<64>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<128>)->Arg(10)->Name("BM_BitsetToString<128>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<256>)->Arg(10)->Name("BM_BitsetToString<256>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<512>)->Arg(10)->Name("BM_BitsetToString<512>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<1024>)->Arg(10)->Name("BM_BitsetToString<1024>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<2048>)->Arg(10)->Name("BM_BitsetToString<2048>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<4096>)->Arg(10)->Name("BM_BitsetToString<4096>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<8192>)->Arg(10)->Name("BM_BitsetToString<8192>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<16384>)->Arg(10)->Name("BM_BitsetToString<16384>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<32768>)->Arg(10)->Name("BM_BitsetToString<32768>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<65536>)->Arg(10)->Name("BM_BitsetToString<65536>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<131072>)->Arg(10)->Name("BM_BitsetToString<131072>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<262144>)->Arg(10)->Name("BM_BitsetToString<262144>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<524288>)->Arg(10)->Name("BM_BitsetToString<524288>/Sparse (10%)");
+BENCHMARK(BM_BitsetToString<1048576>)->Arg(10)->Name("BM_BitsetToString<1048576>/Sparse (10%)"); // 1 << 20
+
+// Dense bitset
+BENCHMARK(BM_BitsetToString<32>)->Arg(90)->Name("BM_BitsetToString<32>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<64>)->Arg(90)->Name("BM_BitsetToString<64>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<128>)->Arg(90)->Name("BM_BitsetToString<128>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<256>)->Arg(90)->Name("BM_BitsetToString<256>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<512>)->Arg(90)->Name("BM_BitsetToString<512>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<1024>)->Arg(90)->Name("BM_BitsetToString<1024>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<2048>)->Arg(90)->Name("BM_BitsetToString<2048>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<4096>)->Arg(90)->Name("BM_BitsetToString<4096>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<8192>)->Arg(90)->Name("BM_BitsetToString<8192>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<16384>)->Arg(90)->Name("BM_BitsetToString<16384>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<32768>)->Arg(90)->Name("BM_BitsetToString<32768>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<65536>)->Arg(90)->Name("BM_BitsetToString<65536>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<131072>)->Arg(90)->Name("BM_BitsetToString<131072>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<262144>)->Arg(90)->Name("BM_BitsetToString<262144>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<524288>)->Arg(90)->Name("BM_BitsetToString<524288>/Dense (90%)");
+BENCHMARK(BM_BitsetToString<1048576>)->Arg(90)->Name("BM_BitsetToString<1048576>/Dense (90%)"); // 1 << 20
+
+// Uniform bitset
+BENCHMARK(BM_BitsetToString<32>)->Arg(50)->Name("BM_BitsetToString<32>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<64>)->Arg(50)->Name("BM_BitsetToString<64>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<128>)->Arg(50)->Name("BM_BitsetToString<128>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<256>)->Arg(50)->Name("BM_BitsetToString<256>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<512>)->Arg(50)->Name("BM_BitsetToString<512>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<1024>)->Arg(50)->Name("BM_BitsetToString<1024>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<2048>)->Arg(50)->Name("BM_BitsetToString<2048>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<4096>)->Arg(50)->Name("BM_BitsetToString<4096>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<8192>)->Arg(50)->Name("BM_BitsetToString<8192>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<16384>)->Arg(50)->Name("BM_BitsetToString<16384>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<32768>)->Arg(50)->Name("BM_BitsetToString<32768>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<65536>)->Arg(50)->Name("BM_BitsetToString<65536>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<131072>)->Arg(50)->Name("BM_BitsetToString<131072>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<262144>)->Arg(50)->Name("BM_BitsetToString<262144>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<524288>)->Arg(50)->Name("BM_BitsetToString<524288>/Uniform (50%)");
+BENCHMARK(BM_BitsetToString<1048576>)->Arg(50)->Name("BM_BitsetToString<1048576>/Uniform (50%)"); // 1 << 20
+
+BENCHMARK_MAIN();
``````````
</details>
https://github.com/llvm/llvm-project/pull/128832
More information about the libcxx-commits
mailing list