[libcxx-commits] [PATCH] D129964: [libc++][format] Improve format buffer.

Sun Jul 17 07:31:15 PDT 2022

Mordante created this revision.
Herald added a project: All.
Mordante requested review of this revision.
Herald added a project: libc++.
Herald added a subscriber: libcxx-commits.
Herald added a reviewer: libc++.

Allow bulk output operations on the buffer instead of adding one
code unit at a time. This has a huge performance benefit at the cost of
larger binary. This doesn't implement @vitaut's earlier suggestion to
avoid buffering for std::string when writing a strings. That can be done
in a follow-up patch.

There are some minor complications for the non-buffered format_to_n.
When writing one character at a time it's easy to detect when reaching
the limit n. This is solved by adding a small overhead for format_to_n.
When the next write would overflow it stores the data in the internal
buffer and copies that up-to n code units. The overhead isn't measured,
but it's expected to only be an issue for small values of n; for larger
values the general improvements will outweight the new overhead.

     text	   data	    bss	    dec	    hex	filename
   349081	   6096	    440	 355617	  56d21	format.libcxx.out-baseline
   344442	   6088	    440	 350970	  55afa	formatted_size.libcxx.out-baseline
  4567980	  57272	    424	4625676	 46950c	formatter_float.libcxx.out-baseline
   718800	  12472	    488	 731760	  b2a70	formatter_int.libcxx.out-baseline
   376341	   6096	    552	 382989	  5d80d	format_to.libcxx.out-beaseline

   370169	   6096	    440	 376705	  5bf81	format.libcxx.out
   365530	   6088	    440	 372058	  5ad5a	formatted_size.libcxx.out
  4575116	  57272	    424	4632812	 46b0ec	formatter_float.libcxx.out
   725936	  12472	    488	 738896	  b4650	formatter_int.libcxx.out
   397429	   6096	    552	 404077	  62a6d	format_to.libcxx.out

For very small strings the new method is slower, from 4 characters
there's already a small gain.

  Comparing ./format.libcxx.out-baseline to ./format.libcxx.out
  Benchmark                                           Time             CPU      Time Old      Time New       CPU Old       CPU New
  --------------------------------------------------------------------------------------------------------------------------------
  BM_format_string<char>/1                         +0.0268         +0.0268            43            44            43            44
  BM_format_string<char>/2                         +0.0133         +0.0133            22            22            22            22
  BM_format_string<char>/4                         -0.0248         -0.0248            12            11            12            11
  BM_format_string<char>/8                         -0.0831         -0.0831             6             6             6             6
  BM_format_string<char>/16                        -0.2976         -0.2976             4             3             4             3
  BM_format_string<char>/32                        -0.4369         -0.4369             3             2             3             2
  BM_format_string<char>/64                        -0.6375         -0.6375             3             1             3             1
  BM_format_string<char>/128                       -0.7685         -0.7685             2             1             2             1

The int benchmark has benefits for the simple formatting, but shines for
the complex formatting:

  Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out
  Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
  ----------------------------------------------------------------------------------------------------------------------------------------------------
  BM_Basic<uint32_t>                                                   -0.2307         -0.2307            60            46            60            46
  BM_Basic<int32_t>                                                    -0.1985         -0.1985            61            49            61            49
  BM_Basic<uint64_t>                                                   -0.3478         -0.3479            81            53            81            53
  BM_Basic<int64_t>                                                    -0.3475         -0.3475            81            53            81            53
  BM_BasicLow<__uint128_t>                                             -0.3388         -0.3388            86            57            86            57
  BM_BasicLow<__int128_t>                                              -0.3431         -0.3431            86            57            86            57
  BM_Basic<__uint128_t>                                                -0.2822         -0.2822           236           170           236           170
  BM_Basic<__int128_t>                                                 -0.3107         -0.3107           219           151           219           151
  Integral_LocFalse_BaseBin_AlignNone_Int64                            -0.5781         -0.5781           178            75           178            75
  Integral_LocFalse_BaseBin_AlignmentLeft_Int64                        -0.9231         -0.9231          1156            89          1156            89
  Integral_LocFalse_BaseBin_AlignmentCenter_Int64                      -0.9179         -0.9179          1107            91          1107            91
  Integral_LocFalse_BaseBin_AlignmentRight_Int64                       -0.9238         -0.9238          1147            87          1147            87
  Integral_LocFalse_BaseBin_ZeroPadding_Int64                          -0.9170         -0.9170          1137            94          1137            94
  Integral_LocFalse_BaseBin_AlignNone_Uint64                           -0.5923         -0.5923           175            71           175            71
  Integral_LocFalse_BaseBin_AlignmentLeft_Uint64                       -0.9251         -0.9251          1154            86          1154            86
  Integral_LocFalse_BaseBin_AlignmentCenter_Uint64                     -0.9204         -0.9204          1105            88          1105            88
  Integral_LocFalse_BaseBin_AlignmentRight_Uint64                      -0.9242         -0.9242          1125            85          1125            85
  Integral_LocFalse_BaseBin_ZeroPadding_Uint64                         -0.9232         -0.9232          1139            88          1139            88
  Integral_LocFalse_BaseOct_AlignNone_Int64                            -0.3241         -0.3241           100            67           100            67
  Integral_LocFalse_BaseOct_AlignmentLeft_Int64                        -0.9322         -0.9322          1166            79          1166            79
  Integral_LocFalse_BaseOct_AlignmentCenter_Int64                      -0.9251         -0.9251          1108            83          1108            83
  Integral_LocFalse_BaseOct_AlignmentRight_Int64                       -0.9303         -0.9303          1136            79          1136            79
  Integral_LocFalse_BaseOct_ZeroPadding_Int64                          -0.9264         -0.9264          1156            85          1156            85
  Integral_LocFalse_BaseOct_AlignNone_Uint64                           -0.3116         -0.3116            96            66            96            66
  Integral_LocFalse_BaseOct_AlignmentLeft_Uint64                       -0.9310         -0.9310          1168            81          1168            81
  Integral_LocFalse_BaseOct_AlignmentCenter_Uint64                     -0.9281         -0.9281          1128            81          1128            81
  Integral_LocFalse_BaseOct_AlignmentRight_Uint64                      -0.9299         -0.9299          1148            80          1148            80
  Integral_LocFalse_BaseOct_ZeroPadding_Uint64                         -0.9288         -0.9288          1153            82          1153            82
  Integral_LocFalse_BaseDec_AlignNone_Int64                            -0.3342         -0.3342            95            63            95            63
  Integral_LocFalse_BaseDec_AlignmentLeft_Int64                        -0.9360         -0.9360          1157            74          1157            74
  Integral_LocFalse_BaseDec_AlignmentCenter_Int64                      -0.9303         -0.9303          1128            79          1128            79
  Integral_LocFalse_BaseDec_AlignmentRight_Int64                       -0.9369         -0.9369          1164            73          1164            73
  Integral_LocFalse_BaseDec_ZeroPadding_Int64                          -0.9323         -0.9323          1157            78          1157            78
  Integral_LocFalse_BaseDec_AlignNone_Uint64                           -0.3198         -0.3198            93            63            93            63
  Integral_LocFalse_BaseDec_AlignmentLeft_Uint64                       -0.9351         -0.9351          1158            75          1158            75
  Integral_LocFalse_BaseDec_AlignmentCenter_Uint64                     -0.9298         -0.9298          1128            79          1128            79
  Integral_LocFalse_BaseDec_AlignmentRight_Uint64                      -0.9361         -0.9361          1157            74          1157            74
  Integral_LocFalse_BaseDec_ZeroPadding_Uint64                         -0.9333         -0.9333          1151            77          1151            77
  Integral_LocFalse_BaseHex_AlignNone_Int64                            -0.3020         -0.3020            89            62            89            62
  Integral_LocFalse_BaseHex_AlignmentLeft_Int64                        -0.9357         -0.9357          1174            75          1174            75
  Integral_LocFalse_BaseHex_AlignmentCenter_Int64                      -0.9319         -0.9319          1129            77          1129            77
  Integral_LocFalse_BaseHex_AlignmentRight_Int64                       -0.9350         -0.9350          1161            75          1161            75
  Integral_LocFalse_BaseHex_ZeroPadding_Int64                          -0.9293         -0.9293          1150            81          1150            81
  Integral_LocFalse_BaseHex_AlignNone_Uint64                           -0.3056         -0.3057            86            59            86            59
  Integral_LocFalse_BaseHex_AlignmentLeft_Uint64                       -0.9378         -0.9378          1174            73          1174            73
  Integral_LocFalse_BaseHex_AlignmentCenter_Uint64                     -0.9341         -0.9341          1129            74          1130            74
  Integral_LocFalse_BaseHex_AlignmentRight_Uint64                      -0.9361         -0.9361          1157            74          1157            74
  Integral_LocFalse_BaseHex_ZeroPadding_Uint64                         -0.9315         -0.9315          1147            79          1147            79
  Integral_LocFalse_BaseHexUpper_AlignNone_Int64                       -0.0019         -0.0019            91            90            91            90
  Integral_LocFalse_BaseHexUpper_AlignmentLeft_Int64                   -0.9099         -0.9099          1162           105          1162           105
  Integral_LocFalse_BaseHexUpper_AlignmentCenter_Int64                 -0.9041         -0.9041          1121           108          1121           108
  Integral_LocFalse_BaseHexUpper_AlignmentRight_Int64                  -0.9086         -0.9086          1162           106          1162           106
  Integral_LocFalse_BaseHexUpper_ZeroPadding_Int64                     -0.9057         -0.9057          1164           110          1164           110
  Integral_LocFalse_BaseHexUpper_AlignNone_Uint64                      +0.0110         +0.0110            86            87            86            87
  Integral_LocFalse_BaseHexUpper_AlignmentLeft_Uint64                  -0.9136         -0.9136          1161           100          1161           100
  Integral_LocFalse_BaseHexUpper_AlignmentCenter_Uint64                -0.9078         -0.9078          1133           104          1133           104
  Integral_LocFalse_BaseHexUpper_AlignmentRight_Uint64                 -0.9132         -0.9132          1177           102          1177           102
  Integral_LocFalse_BaseHexUpper_ZeroPadding_Uint64                    -0.9091         -0.9091          1160           105          1160           105

Other benchmarks give similar results.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D129964

Files:
  libcxx/include/__format/buffer.h
  libcxx/include/__format/formatter_floating_point.h
  libcxx/include/__format/formatter_integral.h
  libcxx/include/__format/formatter_output.h
  libcxx/test/std/utilities/format/format.formatter/format.formatter.spec/formatter.unsigned_integral.pass.cpp
  libcxx/test/std/utilities/format/format.functions/format_tests.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D129964.445316.patch
Type: text/x-patch
Size: 31999 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libcxx-commits/attachments/20220717/4f1a72fc/attachment-0001.bin>