[PATCH] D58580: [Support] llvm::to_string(): raw_string_ostream is a memory hog
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 23 07:56:18 PST 2019
lebedev.ri created this revision.
lebedev.ri added reviewers: rnk, zturner, chandlerc.
lebedev.ri added a project: LLVM.
Herald added subscribers: jdoerfert, kristina, courbet.
I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.
So i have looked at the `llvm-xray convert` itself in heaptrack <https://github.com/KDE/heaptrack>,
and this was the obvious problem point. It converts a *lot* of `int`s
to strings.
`llvm::to_string()` uses `raw_string_ostream` as the tmp stream.
It makes sense, you don't really know with what it will be called with,
it's a generic support function. But there is a downside to that.
`raw_string_ostream` is buffered, and minimal size of that buffer
is `BUFSIZ` (which is `8192` at least here). That is *huge*.
So if you call `llvm::to_string(int)`, even though the output `std::string`
will be ~`12` chars, you still just wasted `8192` temporary bytes...
But for arithmetic types, we can have an upper estimation of the
output string length, and thus we can use `SmallString`,
and only pay for the final price, not any intermediate tmp price.
While it won't radically improve the `llvm-xray convert` perf,
(there are likely other issues elsewhere)
this is quite clearly better than using `raw_string_ostream`,
in *ALL* cases.
`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.
Timings:
Old:
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% )
314 context-switches # 14.701 M/sec ( +- 59.13% )
1 cpu-migrations # 0.037 M/sec ( +-100.00% )
2181354 page-faults # 102191.251 M/sec ( +- 0.02% )
85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%)
14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%)
32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%)
67896890228 instructions # 0.79 insn per cycle
# 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%)
14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%)
212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%)
21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% )
New:
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
20445.97 msec task-clock # 1.000 CPUs utilized ( +- 0.16% )
504 context-switches # 24.651 M/sec ( +- 34.45% )
0 cpu-migrations # 0.020 M/sec ( +- 61.24% )
2181595 page-faults # 106702.410 M/sec ( +- 0.01% )
81871726833 cycles # 4004369.000 GHz ( +- 0.16% ) (83.32%)
14329499463 stalled-cycles-frontend # 17.50% frontend cycles idle ( +- 0.57% ) (83.33%)
31260757379 stalled-cycles-backend # 38.18% backend cycles idle ( +- 0.21% ) (33.35%)
64198631119 instructions # 0.78 insn per cycle
# 0.49 stalled cycles per insn ( +- 0.06% ) (50.02%)
13619488662 branches # 666132990.101 M/sec ( +- 0.06% ) (66.68%)
197109413 branch-misses # 1.45% of all branches ( +- 0.27% ) (83.34%)
20.4515 +- 0.0324 seconds time elapsed ( +- 0.16% )
Memory:
Old:
total runtime: 39.33s.
bytes allocated in total (ignoring deallocations): 79.07GB (2.01GB/s)
calls to allocation functions: 33267816 (845799/s)
temporary memory allocations: 5832298 (148280/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 87.41KB
New:
total runtime: 35.34s.
bytes allocated in total (ignoring deallocations): 33.20GB (939.39MB/s)
calls to allocation functions: 27668393 (782809/s)
temporary memory allocations: 232875 (6588/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.97GB
total memory leaked: 87.16KB
Diff:
total runtime: -3.99s.
bytes allocated in total (ignoring deallocations): -45.87GB (11.50GB/s)
calls to allocation functions: -5599423 (1404067/s)
temporary memory allocations: -5599423 (1404067/s)
peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: -255B
Repository:
rL LLVM
https://reviews.llvm.org/D58580
Files:
include/llvm/Support/ScopedPrinter.h
Index: include/llvm/Support/ScopedPrinter.h
===================================================================
--- include/llvm/Support/ScopedPrinter.h
+++ include/llvm/Support/ScopedPrinter.h
@@ -11,12 +11,14 @@
#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/DataTypes.h"
#include "llvm/Support/Endian.h"
#include "llvm/Support/raw_ostream.h"
#include <algorithm>
+#include <limits>
namespace llvm {
@@ -58,13 +60,34 @@
raw_ostream &operator<<(raw_ostream &OS, const HexNumber &Value);
const std::string to_hexString(uint64_t Value, bool UpperCase = true);
-template <class T> const std::string to_string(const T &Value) {
+template <class T, typename = typename std::enable_if<
+ !std::numeric_limits<T>::is_specialized>::type>
+const std::string to_string(const T &Value) {
std::string number;
llvm::raw_string_ostream stream(number);
stream << Value;
return stream.str();
}
+template <class T,
+ typename = typename std::enable_if<
+ std::numeric_limits<T>::is_specialized>::type,
+ void * = nullptr>
+const std::string to_string(const T &Value) {
+ static constexpr auto NumDigits = std::numeric_limits<T>::is_integer
+ ? std::numeric_limits<T>::digits10
+ : std::numeric_limits<T>::max_digits10;
+ // Optional char for sign bit, plus the required number of base-10 digits.
+ // If integer - then +1 is for rounding up. Else, for the decimal separator.
+ static constexpr auto BufLen =
+ std::numeric_limits<T>::is_signed + NumDigits + 1;
+ llvm::SmallString<BufLen> number;
+ llvm::raw_svector_ostream stream(number);
+ stream << Value;
+ assert(stream.str().size() <= BufLen && "small size prediction failed");
+ return stream.str();
+}
+
class ScopedPrinter {
public:
ScopedPrinter(raw_ostream &OS) : OS(OS), IndentLevel(0) {}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58580.188050.patch
Type: text/x-patch
Size: 2057 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190223/5e1a029e/attachment.bin>
More information about the llvm-commits
mailing list