[PATCH] D58580: [Support] llvm::to_string(): raw_string_ostream is a memory hog

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 23 07:56:18 PST 2019


lebedev.ri created this revision.
lebedev.ri added reviewers: rnk, zturner, chandlerc.
lebedev.ri added a project: LLVM.
Herald added subscribers: jdoerfert, kristina, courbet.

I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.

So i have looked at the `llvm-xray convert` itself in heaptrack <https://github.com/KDE/heaptrack>,
and this was the obvious problem point. It converts a *lot* of `int`s
to strings.

`llvm::to_string()` uses `raw_string_ostream` as the tmp stream.
It makes sense, you don't really know with what it will be called with,
it's a generic support function. But there is a downside to that.
`raw_string_ostream` is buffered, and minimal size of that buffer
is `BUFSIZ` (which is `8192` at least here). That is *huge*.
So if you call `llvm::to_string(int)`, even though the output `std::string`
will be ~`12` chars, you still just wasted `8192` temporary bytes...

But for arithmetic types, we can have an upper estimation of the
output string length, and thus we can use `SmallString`,
and only pay for the final price, not any intermediate tmp price.

While it won't radically improve the `llvm-xray convert` perf,
(there are likely other issues elsewhere)
this is quite clearly better than using `raw_string_ostream`,
in *ALL* cases.

`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.

Timings:
Old:

  $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT 
  
   Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
  
            21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
                 314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
                   1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
             2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
         85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
         14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
         32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
         67896890228      instructions              #    0.79  insn per cycle         
                                                    #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
         14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
           212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)
  
             21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )

New:

  $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT 
  
   Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
  
            20445.97 msec task-clock                #    1.000 CPUs utilized            ( +-  0.16% )
                 504      context-switches          #   24.651 M/sec                    ( +- 34.45% )
                   0      cpu-migrations            #    0.020 M/sec                    ( +- 61.24% )
             2181595      page-faults               # 106702.410 M/sec                  ( +-  0.01% )
         81871726833      cycles                    # 4004369.000 GHz                   ( +-  0.16% )  (83.32%)
         14329499463      stalled-cycles-frontend   #   17.50% frontend cycles idle     ( +-  0.57% )  (83.33%)
         31260757379      stalled-cycles-backend    #   38.18% backend cycles idle      ( +-  0.21% )  (33.35%)
         64198631119      instructions              #    0.78  insn per cycle         
                                                    #    0.49  stalled cycles per insn  ( +-  0.06% )  (50.02%)
         13619488662      branches                  # 666132990.101 M/sec               ( +-  0.06% )  (66.68%)
           197109413      branch-misses             #    1.45% of all branches          ( +-  0.27% )  (83.34%)
  
             20.4515 +- 0.0324 seconds time elapsed  ( +-  0.16% )

Memory:
Old:

  total runtime: 39.33s.
  bytes allocated in total (ignoring deallocations): 79.07GB (2.01GB/s)
  calls to allocation functions: 33267816 (845799/s)
  temporary memory allocations: 5832298 (148280/s)
  peak heap memory consumption: 9.21GB
  peak RSS (including heaptrack overhead): 147.98GB
  total memory leaked: 87.41KB

New:

  total runtime: 35.34s.
  bytes allocated in total (ignoring deallocations): 33.20GB (939.39MB/s)
  calls to allocation functions: 27668393 (782809/s)
  temporary memory allocations: 232875 (6588/s)
  peak heap memory consumption: 9.21GB
  peak RSS (including heaptrack overhead): 147.97GB
  total memory leaked: 87.16KB

Diff:

  total runtime: -3.99s.
  bytes allocated in total (ignoring deallocations): -45.87GB (11.50GB/s)
  calls to allocation functions: -5599423 (1404067/s)
  temporary memory allocations: -5599423 (1404067/s)
  peak heap memory consumption: 0B
  peak RSS (including heaptrack overhead): 0B
  total memory leaked: -255B


Repository:
  rL LLVM

https://reviews.llvm.org/D58580

Files:
  include/llvm/Support/ScopedPrinter.h


Index: include/llvm/Support/ScopedPrinter.h
===================================================================
--- include/llvm/Support/ScopedPrinter.h
+++ include/llvm/Support/ScopedPrinter.h
@@ -11,12 +11,14 @@
 
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/DataTypes.h"
 #include "llvm/Support/Endian.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
+#include <limits>
 
 namespace llvm {
 
@@ -58,13 +60,34 @@
 raw_ostream &operator<<(raw_ostream &OS, const HexNumber &Value);
 const std::string to_hexString(uint64_t Value, bool UpperCase = true);
 
-template <class T> const std::string to_string(const T &Value) {
+template <class T, typename = typename std::enable_if<
+                       !std::numeric_limits<T>::is_specialized>::type>
+const std::string to_string(const T &Value) {
   std::string number;
   llvm::raw_string_ostream stream(number);
   stream << Value;
   return stream.str();
 }
 
+template <class T,
+          typename = typename std::enable_if<
+              std::numeric_limits<T>::is_specialized>::type,
+          void * = nullptr>
+const std::string to_string(const T &Value) {
+  static constexpr auto NumDigits = std::numeric_limits<T>::is_integer
+                                        ? std::numeric_limits<T>::digits10
+                                        : std::numeric_limits<T>::max_digits10;
+  // Optional char for sign bit, plus the required number of base-10 digits.
+  // If integer - then +1 is for rounding up. Else, for the decimal separator.
+  static constexpr auto BufLen =
+      std::numeric_limits<T>::is_signed + NumDigits + 1;
+  llvm::SmallString<BufLen> number;
+  llvm::raw_svector_ostream stream(number);
+  stream << Value;
+  assert(stream.str().size() <= BufLen && "small size prediction failed");
+  return stream.str();
+}
+
 class ScopedPrinter {
 public:
   ScopedPrinter(raw_ostream &OS) : OS(OS), IndentLevel(0) {}


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58580.188050.patch
Type: text/x-patch
Size: 2057 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190223/5e1a029e/attachment.bin>


More information about the llvm-commits mailing list