[all-commits] [llvm/llvm-project] f7c0df: [libc++][format] Improve format buffer.
mordante via All-commits
all-commits at lists.llvm.org
Tue Aug 16 09:54:24 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: f7c0df002a083bcca8ac4972330b8198474a355b
https://github.com/llvm/llvm-project/commit/f7c0df002a083bcca8ac4972330b8198474a355b
Author: Mark de Wever <koraq at xs4all.nl>
Date: 2022-08-16 (Tue, 16 Aug 2022)
Changed paths:
M libcxx/include/__format/buffer.h
M libcxx/include/__format/formatter_floating_point.h
M libcxx/include/__format/formatter_integral.h
M libcxx/include/__format/formatter_output.h
M libcxx/test/std/utilities/format/format.formatter/format.formatter.spec/formatter.unsigned_integral.pass.cpp
M libcxx/test/std/utilities/format/format.functions/format_tests.h
Log Message:
-----------
[libc++][format] Improve format buffer.
Allow bulk output operations on the buffer instead of adding one
code unit at a time. This has a huge performance benefit at the cost of
larger binary. This doesn't implement @vitaut's earlier suggestion to
avoid buffering for std::string when writing a strings. That can be done
in a follow-up patch.
There are some minor complications for the non-buffered format_to_n.
When writing one character at a time it's easy to detect when reaching
the limit n. This is solved by adding a small overhead for format_to_n.
When the next write would overflow it stores the data in the internal
buffer and copies that up-to n code units. The overhead isn't measured,
but it's expected to only be an issue for small values of n; for larger
values the general improvements will outweight the new overhead.
```
text data bss dec hex filename
349081 6096 440 355617 56d21 format.libcxx.out-baseline
344442 6088 440 350970 55afa formatted_size.libcxx.out-baseline
4567980 57272 424 4625676 46950c formatter_float.libcxx.out-baseline
718800 12472 488 731760 b2a70 formatter_int.libcxx.out-baseline
376341 6096 552 382989 5d80d format_to.libcxx.out-beaseline
370169 6096 440 376705 5bf81 format.libcxx.out
365530 6088 440 372058 5ad5a formatted_size.libcxx.out
4575116 57272 424 4632812 46b0ec formatter_float.libcxx.out
725936 12472 488 738896 b4650 formatter_int.libcxx.out
397429 6096 552 404077 62a6d format_to.libcxx.out
```
For very small strings the new method is slower, from 4 characters
there's already a small gain.
```
Comparing ./format.libcxx.out-baseline to ./format.libcxx.out
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------------------------------
BM_format_string<char>/1 +0.0268 +0.0268 43 44 43 44
BM_format_string<char>/2 +0.0133 +0.0133 22 22 22 22
BM_format_string<char>/4 -0.0248 -0.0248 12 11 12 11
BM_format_string<char>/8 -0.0831 -0.0831 6 6 6 6
BM_format_string<char>/16 -0.2976 -0.2976 4 3 4 3
BM_format_string<char>/32 -0.4369 -0.4369 3 2 3 2
BM_format_string<char>/64 -0.6375 -0.6375 3 1 3 1
BM_format_string<char>/128 -0.7685 -0.7685 2 1 2 1
```
The int benchmark has benefits for the simple formatting, but shines for
the complex formatting:
```
Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
BM_Basic<uint32_t> -0.2307 -0.2307 60 46 60 46
BM_Basic<int32_t> -0.1985 -0.1985 61 49 61 49
BM_Basic<uint64_t> -0.3478 -0.3479 81 53 81 53
BM_Basic<int64_t> -0.3475 -0.3475 81 53 81 53
BM_BasicLow<__uint128_t> -0.3388 -0.3388 86 57 86 57
BM_BasicLow<__int128_t> -0.3431 -0.3431 86 57 86 57
BM_Basic<__uint128_t> -0.2822 -0.2822 236 170 236 170
BM_Basic<__int128_t> -0.3107 -0.3107 219 151 219 151
Integral_LocFalse_BaseBin_AlignNone_Int64 -0.5781 -0.5781 178 75 178 75
Integral_LocFalse_BaseBin_AlignmentLeft_Int64 -0.9231 -0.9231 1156 89 1156 89
Integral_LocFalse_BaseBin_AlignmentCenter_Int64 -0.9179 -0.9179 1107 91 1107 91
Integral_LocFalse_BaseBin_AlignmentRight_Int64 -0.9238 -0.9238 1147 87 1147 87
Integral_LocFalse_BaseBin_ZeroPadding_Int64 -0.9170 -0.9170 1137 94 1137 94
Integral_LocFalse_BaseBin_AlignNone_Uint64 -0.5923 -0.5923 175 71 175 71
Integral_LocFalse_BaseBin_AlignmentLeft_Uint64 -0.9251 -0.9251 1154 86 1154 86
Integral_LocFalse_BaseBin_AlignmentCenter_Uint64 -0.9204 -0.9204 1105 88 1105 88
Integral_LocFalse_BaseBin_AlignmentRight_Uint64 -0.9242 -0.9242 1125 85 1125 85
Integral_LocFalse_BaseBin_ZeroPadding_Uint64 -0.9232 -0.9232 1139 88 1139 88
Integral_LocFalse_BaseOct_AlignNone_Int64 -0.3241 -0.3241 100 67 100 67
Integral_LocFalse_BaseOct_AlignmentLeft_Int64 -0.9322 -0.9322 1166 79 1166 79
Integral_LocFalse_BaseOct_AlignmentCenter_Int64 -0.9251 -0.9251 1108 83 1108 83
Integral_LocFalse_BaseOct_AlignmentRight_Int64 -0.9303 -0.9303 1136 79 1136 79
Integral_LocFalse_BaseOct_ZeroPadding_Int64 -0.9264 -0.9264 1156 85 1156 85
Integral_LocFalse_BaseOct_AlignNone_Uint64 -0.3116 -0.3116 96 66 96 66
Integral_LocFalse_BaseOct_AlignmentLeft_Uint64 -0.9310 -0.9310 1168 81 1168 81
Integral_LocFalse_BaseOct_AlignmentCenter_Uint64 -0.9281 -0.9281 1128 81 1128 81
Integral_LocFalse_BaseOct_AlignmentRight_Uint64 -0.9299 -0.9299 1148 80 1148 80
Integral_LocFalse_BaseOct_ZeroPadding_Uint64 -0.9288 -0.9288 1153 82 1153 82
Integral_LocFalse_BaseDec_AlignNone_Int64 -0.3342 -0.3342 95 63 95 63
Integral_LocFalse_BaseDec_AlignmentLeft_Int64 -0.9360 -0.9360 1157 74 1157 74
Integral_LocFalse_BaseDec_AlignmentCenter_Int64 -0.9303 -0.9303 1128 79 1128 79
Integral_LocFalse_BaseDec_AlignmentRight_Int64 -0.9369 -0.9369 1164 73 1164 73
Integral_LocFalse_BaseDec_ZeroPadding_Int64 -0.9323 -0.9323 1157 78 1157 78
Integral_LocFalse_BaseDec_AlignNone_Uint64 -0.3198 -0.3198 93 63 93 63
Integral_LocFalse_BaseDec_AlignmentLeft_Uint64 -0.9351 -0.9351 1158 75 1158 75
Integral_LocFalse_BaseDec_AlignmentCenter_Uint64 -0.9298 -0.9298 1128 79 1128 79
Integral_LocFalse_BaseDec_AlignmentRight_Uint64 -0.9361 -0.9361 1157 74 1157 74
Integral_LocFalse_BaseDec_ZeroPadding_Uint64 -0.9333 -0.9333 1151 77 1151 77
Integral_LocFalse_BaseHex_AlignNone_Int64 -0.3020 -0.3020 89 62 89 62
Integral_LocFalse_BaseHex_AlignmentLeft_Int64 -0.9357 -0.9357 1174 75 1174 75
Integral_LocFalse_BaseHex_AlignmentCenter_Int64 -0.9319 -0.9319 1129 77 1129 77
Integral_LocFalse_BaseHex_AlignmentRight_Int64 -0.9350 -0.9350 1161 75 1161 75
Integral_LocFalse_BaseHex_ZeroPadding_Int64 -0.9293 -0.9293 1150 81 1150 81
Integral_LocFalse_BaseHex_AlignNone_Uint64 -0.3056 -0.3057 86 59 86 59
Integral_LocFalse_BaseHex_AlignmentLeft_Uint64 -0.9378 -0.9378 1174 73 1174 73
Integral_LocFalse_BaseHex_AlignmentCenter_Uint64 -0.9341 -0.9341 1129 74 1130 74
Integral_LocFalse_BaseHex_AlignmentRight_Uint64 -0.9361 -0.9361 1157 74 1157 74
Integral_LocFalse_BaseHex_ZeroPadding_Uint64 -0.9315 -0.9315 1147 79 1147 79
Integral_LocFalse_BaseHexUpper_AlignNone_Int64 -0.0019 -0.0019 91 90 91 90
Integral_LocFalse_BaseHexUpper_AlignmentLeft_Int64 -0.9099 -0.9099 1162 105 1162 105
Integral_LocFalse_BaseHexUpper_AlignmentCenter_Int64 -0.9041 -0.9041 1121 108 1121 108
Integral_LocFalse_BaseHexUpper_AlignmentRight_Int64 -0.9086 -0.9086 1162 106 1162 106
Integral_LocFalse_BaseHexUpper_ZeroPadding_Int64 -0.9057 -0.9057 1164 110 1164 110
Integral_LocFalse_BaseHexUpper_AlignNone_Uint64 +0.0110 +0.0110 86 87 86 87
Integral_LocFalse_BaseHexUpper_AlignmentLeft_Uint64 -0.9136 -0.9136 1161 100 1161 100
Integral_LocFalse_BaseHexUpper_AlignmentCenter_Uint64 -0.9078 -0.9078 1133 104 1133 104
Integral_LocFalse_BaseHexUpper_AlignmentRight_Uint64 -0.9132 -0.9132 1177 102 1177 102
Integral_LocFalse_BaseHexUpper_ZeroPadding_Uint64 -0.9091 -0.9091 1160 105 1160 105
```
Other benchmarks give similar results.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D129964
More information about the All-commits
mailing list