[PATCH] D21987: [XRay] Initial XRay Function Call Accounting Tool

Fri Aug 5 14:44:48 PDT 2016

dblaikie added a comment.

"In general though I think there are a few things being done here that ought to really be moved into independent units that might be tested in isolation. Do you have suggestions on how to do that with tools? i.e. having unit tests for parts of this implementation?" - I'm certainly generally in favor of separation and unit tests, though I perhaps don't understand the high level features of the tool as-is, or what it might grow, to have a good sense of where the points of separation and testing might occur. Perhaps a more detailed explanation and test cases would help explain this further.

"I was thinking about this, and yes there should be. Let me iterate on this one too and ping back when I have something cleaner." - Happy to talk more about it on IRC or email, etc, to see what design options might suit.

================
Comment at: tools/xray/xray-fc-account.cc:166-169
@@ +165,6 @@
+
+  // Treat Contents as though it were an array of __xray::XRaySledEntry
+  // elements.
+  const __xray::XRaySledEntry *Sleds =
+      reinterpret_cast<const __xray::XRaySledEntry *>(Contents.data());
+  size_t Entries = Contents.size() / sizeof(__xray::XRaySledEntry);
----------------
dberris wrote:
> dblaikie wrote:
> > Is this not an aliasing violation? (memory might not be appropriately aligned, etc - I thought we had to memcpy into memory of the right type to preserve aliasing requirements)
> Interesting. I'm not sure. Should I just do the memcpy into aligned memory first? I don't think it's possible to dynamically allocate aligned memory yet, is it?
If these are just standard layout structs (which they seem to be), you could allocate an array of them with 'new XRaySledEntry[N]' as usual. Though I think you could get aligned memory by new'ing a std::aligned_storage object... maybe?

But this'll still only work to run the tool on the same architecture, layout, etc, as the original - is that an intentional limitation? Or should we be making a more fixed/deliberate choice about the file format and having writers format it appropriately, and readers deserialize it appropriately?

================
Comment at: tools/xray/xray-fc-account.cc:251
@@ +250,3 @@
+
+  int Fd = 0;
+  do {
----------------
Could skip the initialization here - since it's unconditionally assigned to below. (can be nice to avoid the excess initialization, that way if there is a bug in the logic, MSan can diagnose it)

================
Comment at: tools/xray/xray-fc-account.cc:264
@@ +263,3 @@
+  for (int ReadBytes = 0;
+       ((ReadBytes = read(Fd, &Header, sizeof(XRayFileHeader))) <= 0);) {
+    if (ReadBytes == -1) {
----------------
think you have an extra set of parens around this whole condition that aren't needed

Also, thinking about this further - reckon it might be more legible as:

  for ... (ReadBytes = ...) == -1 && errno== EINTER

Then do the error cases outside the loop? Since they only run once if they do ever occur, slightly odd to have them inside the loop - at least I usually find that to be so. How about you?

================
Comment at: tools/xray/xray-fc-account.cc:423
@@ +422,3 @@
+      auto Pct99 = Timings[Pct99Off];
+      std::nth_element(Timings.begin(), Timings.begin() + Timings.size(),
+                       Timings.end());
----------------
Timings.begin() + Timings.size() is Timings.end(), right?

At that point, the nth_element call seems to be equivalent (for your purposes) to a call to std::max, maybe?

================
Comment at: tools/xray/xray-fc-account.cc:236-266
@@ +235,33 @@
+
+// We consolidate the data into a struct which we can output in various forms.
+struct ResultRow {
+  uint64_t Count;
+  double Min;
+  double Median;
+  double Pct90;
+  double Pct99;
+  double Max;
+  std::string DebugInfo;
+  std::string Function;
+};
+
+void PrintStatsCSV(const ResultRow &Row) {
+  outs() << Row.Count << ',' << Row.Min << ',' << Row.Median << ',' << Row.Pct90
+         << ',' << Row.Pct99 << ',' << Row.Max << ",\"" << Row.DebugInfo
+         << "\"," << '"' << Row.Function << "\"\n";
+}
+
+void PrintStatsTEXT(const ResultRow &Row) {
+  outs() << Row.Count << "\t"
+         << "[" << Row.Min << ", " << Row.Median << ", " << Row.Pct90 << ", "
+         << Row.Pct99 << ", " << Row.Max << "] " << Row.DebugInfo << "\t"
+         << Row.Function << "\n";
+}
+
+void PrintStatsHeader(OutputFormats OF) {
+  if (OF == OutputFormats::TEXT)
+    outs() << "count\t[min,median,90%ile,99%ile,max]\tdebug\tfunction\n";
+  if (OF == OutputFormats::CSV)
+    outs() << "count,min,median,90%ile,99%ile,max,debuginfo,function\n";
+}
+
----------------
I was thinking of something more aggressive - factoring out /all/ of the "if (TEXT)/if (CSV)" code, or at least most of it, from the main part of the program, not just these two places. Though I'm open to hearing that that's not practical/useful, for sure - I'm just looking at the high level & seeing this pivot and wondering if it could be streamlined/separated a bit.

https://reviews.llvm.org/D21987