[llvm] r354764 - [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"

Hans Wennborg via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 26 02:01:37 PST 2019


Merged to release_80 in r354856.

On Mon, Feb 25, 2019 at 8:38 AM Roman Lebedev via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
>
> Author: lebedevri
> Date: Sun Feb 24 23:39:07 2019
> New Revision: 354764
>
> URL: http://llvm.org/viewvc/llvm-project?rev=354764&view=rev
> Log:
> [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
>
> Summary:
> This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert
>
> Abstractions are great.
> Readable code is great.
> JSON support library is a *good* idea.
>
> However unfortunately, there is an internal detail that one needs
> to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
> So for **every** `llvm::json::Object`, even if you only store a single `int`
> entry there, you pay the whole price of `llvm::DenseMap`.
>
> Unfortunately, it matters for `llvm-xray`.
>
> I was trying to analyse the `llvm-exegesis` analysis mode performance,
> and for that i wanted to view the LLVM X-Ray log visualization in Chrome
> trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
> even ended up being killed by OOM.
>
> `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
> (compiled with ` -fxray-instruction-threshold=128`)
> analysis mode over `-benchmarks-file` with 10099 points (one full
> latency measurement set), with normal runtime of 0.387s.
>
> Timings:
> Old: (copied from D58580)
> ```
> $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
>
>  Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
>
>           21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
>                314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
>                  1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
>            2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
>        85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
>        14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
>        32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
>        67896890228      instructions              #    0.79  insn per cycle
>                                                   #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
>        14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
>          212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)
>
>            21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )
> ```
> New:
> ```
> $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
>
>  Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):
>
>            7178.38 msec task-clock                #    1.000 CPUs utilized            ( +-  0.26% )
>                182      context-switches          #   25.402 M/sec                    ( +- 28.84% )
>                  0      cpu-migrations            #    0.046 M/sec                    ( +- 70.71% )
>              33701      page-faults               # 4694.994 M/sec                    ( +-  0.88% )
>        28761053971      cycles                    # 4006833.933 GHz                   ( +-  0.26% )  (83.32%)
>         2028297997      stalled-cycles-frontend   #    7.05% frontend cycles idle     ( +-  1.61% )  (83.32%)
>        10773154901      stalled-cycles-backend    #   37.46% backend cycles idle      ( +-  0.38% )  (33.36%)
>        36199132874      instructions              #    1.26  insn per cycle
>                                                   #    0.30  stalled cycles per insn  ( +-  0.03% )  (50.02%)
>         6434504227      branches                  # 896420204.421 M/sec               ( +-  0.03% )  (66.68%)
>           73355176      branch-misses             #    1.14% of all branches          ( +-  1.46% )  (83.33%)
>
>             7.1807 +- 0.0190 seconds time elapsed  ( +-  0.26% )
> ```
>
> So using `llvm::json` nearly triples run-time on that test case.
> (+3x is times, not percent.)
>
> Memory:
> Old:
> ```
> total runtime: 39.88s.
> bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
> calls to allocation functions: 33267816 (834135/s)
> temporary memory allocations: 5832298 (146235/s)
> peak heap memory consumption: 9.21GB
> peak RSS (including heaptrack overhead): 147.98GB
> total memory leaked: 1.09MB
> ```
> New:
> ```
> total runtime: 17.42s.
> bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
> calls to allocation functions: 21382982 (1227284/s)
> temporary memory allocations: 232858 (13364/s)
> peak heap memory consumption: 350.69MB
> peak RSS (including heaptrack overhead): 2.55GB
> total memory leaked: 79.95KB
> ```
> Diff:
> ```
> total runtime: -22.46s.
> bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
> calls to allocation functions: -11884834 (529155/s)
> temporary memory allocations: -5599440 (249307/s)
> peak heap memory consumption: -8.86GB
> peak RSS (including heaptrack overhead): 0B
> total memory leaked: -1.01MB
> ```
> So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
> And total allocation count +15x. Both of these numbers are times, *not* percent.
>
> And note that memory usage is clearly unbound with `llvm::json`, it directly depends
> on the length of the log, so peak memory consumption is always increasing.
> This isn't so with the dumb code, there is no accumulating memory consumption,
> peak memory consumption is fixed. Naturally, that means it will handle *much*
> larger logs without OOM'ing.
>
> Readability is good, but the price is simply unacceptable here.
> Too bad none of this analysis was done as part of the development/review D50129 itself.
>
> Reviewers: dberris, kpw, sammccall
>
> Reviewed By: dberris
>
> Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D58584
>
> Modified:
>     llvm/trunk/tools/llvm-xray/xray-converter.cpp
>
> Modified: llvm/trunk/tools/llvm-xray/xray-converter.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-xray/xray-converter.cpp?rev=354764&r1=354763&r2=354764&view=diff
> ==============================================================================
> --- llvm/trunk/tools/llvm-xray/xray-converter.cpp (original)
> +++ llvm/trunk/tools/llvm-xray/xray-converter.cpp Sun Feb 24 23:39:07 2019
> @@ -17,7 +17,6 @@
>  #include "llvm/Support/EndianStream.h"
>  #include "llvm/Support/FileSystem.h"
>  #include "llvm/Support/FormatVariadic.h"
> -#include "llvm/Support/JSON.h"
>  #include "llvm/Support/ScopedPrinter.h"
>  #include "llvm/Support/YAMLTraits.h"
>  #include "llvm/Support/raw_ostream.h"
> @@ -241,6 +240,31 @@ StackTrieNode *findOrCreateStackNode(
>    return CurrentStack;
>  }
>
> +void writeTraceViewerRecord(uint16_t Version, raw_ostream &OS, int32_t FuncId,
> +                            uint32_t TId, uint32_t PId, bool Symbolize,
> +                            const FuncIdConversionHelper &FuncIdHelper,
> +                            double EventTimestampUs,
> +                            const StackTrieNode &StackCursor,
> +                            StringRef FunctionPhenotype) {
> +  OS << "    ";
> +  if (Version >= 3) {
> +    OS << llvm::formatv(
> +        R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "{3}", )"
> +        R"("ts" : "{4:f4}", "sf" : "{5}" })",
> +        (Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
> +                   : llvm::to_string(FuncId)),
> +        FunctionPhenotype, TId, PId, EventTimestampUs,
> +        StackCursor.ExtraData.id);
> +  } else {
> +    OS << llvm::formatv(
> +        R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "1", )"
> +        R"("ts" : "{3:f3}", "sf" : "{4}" })",
> +        (Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
> +                   : llvm::to_string(FuncId)),
> +        FunctionPhenotype, TId, EventTimestampUs, StackCursor.ExtraData.id);
> +  }
> +}
> +
>  } // namespace
>
>  void TraceConverter::exportAsChromeTraceEventFormat(const Trace &Records,
> @@ -251,14 +275,18 @@ void TraceConverter::exportAsChromeTrace
>
>    unsigned id_counter = 0;
>
> +  OS << "{\n  \"traceEvents\": [";
>    DenseMap<uint32_t, StackTrieNode *> StackCursorByThreadId{};
>    DenseMap<uint32_t, SmallVector<StackTrieNode *, 4>> StackRootsByThreadId{};
>    DenseMap<unsigned, StackTrieNode *> StacksByStackId{};
>    std::forward_list<StackTrieNode> NodeStore{};
> -
> -  // Create a JSON Array which will hold all trace events.
> -  json::Array TraceEvents;
> +  int loop_count = 0;
>    for (const auto &R : Records) {
> +    if (loop_count++ == 0)
> +      OS << "\n";
> +    else
> +      OS << ",\n";
> +
>      // Chrome trace event format always wants data in micros.
>      // CyclesPerMicro = CycleHertz / 10^6
>      // TSC / CyclesPerMicro == TSC * 10^6 / CycleHertz == MicroTimestamp
> @@ -283,15 +311,8 @@ void TraceConverter::exportAsChromeTrace
>        // type of B for begin or E for end, thread id, process id,
>        // timestamp in microseconds, and a stack frame id. The ids are logged
>        // in an id dictionary after the events.
> -      TraceEvents.push_back(json::Object({
> -          {"name", Symbolize ? FuncIdHelper.SymbolOrNumber(R.FuncId)
> -                             : llvm::to_string(R.FuncId)},
> -          {"ph", "B"},
> -          {"tid", llvm::to_string(R.TId)},
> -          {"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
> -          {"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
> -          {"sf", llvm::to_string(StackCursor->ExtraData.id)},
> -      }));
> +      writeTraceViewerRecord(Version, OS, R.FuncId, R.TId, R.PId, Symbolize,
> +                             FuncIdHelper, EventTimestampUs, *StackCursor, "B");
>        break;
>      case RecordTypes::EXIT:
>      case RecordTypes::TAIL_EXIT:
> @@ -302,51 +323,43 @@ void TraceConverter::exportAsChromeTrace
>        // (And/Or in loop termination below)
>        StackTrieNode *PreviousCursor = nullptr;
>        do {
> -        TraceEvents.push_back(json::Object({
> -            {"name", Symbolize
> -                         ? FuncIdHelper.SymbolOrNumber(StackCursor->FuncId)
> -                         : llvm::to_string(StackCursor->FuncId)},
> -            {"ph", "E"},
> -            {"tid", llvm::to_string(R.TId)},
> -            {"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
> -            {"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
> -            {"sf", llvm::to_string(StackCursor->ExtraData.id)},
> -        }));
> +        if (PreviousCursor != nullptr) {
> +          OS << ",\n";
> +        }
> +        writeTraceViewerRecord(Version, OS, StackCursor->FuncId, R.TId, R.PId,
> +                               Symbolize, FuncIdHelper, EventTimestampUs,
> +                               *StackCursor, "E");
>          PreviousCursor = StackCursor;
>          StackCursor = StackCursor->Parent;
>        } while (PreviousCursor->FuncId != R.FuncId && StackCursor != nullptr);
>        break;
>      }
>    }
> +  OS << "\n  ],\n"; // Close the Trace Events array.
> +  OS << "  "
> +     << "\"displayTimeUnit\": \"ns\",\n";
>
>    // The stackFrames dictionary substantially reduces size of the output file by
>    // avoiding repeating the entire call stack of function names for each entry.
> -  json::Object StackFrames;
> -  for (const auto &Stack : StacksByStackId) {
> -    const auto &StackId = Stack.first;
> -    const auto &StackFunctionNode = Stack.second;
> -    json::Object::iterator It;
> -    std::tie(It, std::ignore) = StackFrames.insert({
> -        llvm::to_string(StackId),
> -        json::Object{
> -            {"name",
> -             Symbolize ? FuncIdHelper.SymbolOrNumber(StackFunctionNode->FuncId)
> -                       : llvm::to_string(StackFunctionNode->FuncId)}},
> -    });
> -
> -    if (StackFunctionNode->Parent != nullptr)
> -      It->second.getAsObject()->insert(
> -          {"parent", llvm::to_string(StackFunctionNode->Parent->ExtraData.id)});
> +  OS << R"(  "stackFrames": {)";
> +  int stack_frame_count = 0;
> +  for (auto map_iter : StacksByStackId) {
> +    if (stack_frame_count++ == 0)
> +      OS << "\n";
> +    else
> +      OS << ",\n";
> +    OS << "    ";
> +    OS << llvm::formatv(
> +        R"("{0}" : { "name" : "{1}")", map_iter.first,
> +        (Symbolize ? FuncIdHelper.SymbolOrNumber(map_iter.second->FuncId)
> +                   : llvm::to_string(map_iter.second->FuncId)));
> +    if (map_iter.second->Parent != nullptr)
> +      OS << llvm::formatv(R"(, "parent": "{0}")",
> +                          map_iter.second->Parent->ExtraData.id);
> +    OS << " }";
>    }
> -
> -  json::Object TraceJSON{
> -      {"displayTimeUnit", "ns"},
> -      {"traceEvents", std::move(TraceEvents)},
> -      {"stackFrames", std::move(StackFrames)},
> -  };
> -
> -  // Pretty-print the JSON using two spaces for indentations.
> -  OS << formatv("{0:2}", json::Value(std::move(TraceJSON)));
> +  OS << "\n  }\n"; // Close the stack frames map.
> +  OS << "}\n";     // Close the JSON entry.
>  }
>
>  namespace llvm {
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list