[PATCH] Use auto profiler to set branch weights

Wed Oct 30 11:31:04 PDT 2013

  First pass. Lots of small nits or just open questions. Feel free to push back of some of this makes more sense to you in a follow-up patch.

  Also, I'm probably going to ask to get rid of the "Auto" in the name... But let's do that last, right before submit. I don't want to disrupt the comments on the code itself.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:14-18
@@ +13,7 @@
+//
+// This pass generates two types of IR annotations:
+//
+// - autoprofile.samples: Indicates how many samples were collected by
+//      the external profiler during the execution of the program. This
+//      is an unsigned integer value.
+//
----------------
To what IR does this metadata attach to? What is its purpose?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:20-24
@@ +19,7 @@
+//
+// - prof: Represents branch weights. This annotation is added to branches
+//      to indicate the weights of each edge coming out of the branch.
+//      The weight of each edge is the weight of the target block for
+//      that edge. The weight of a block B is computed as the maximum
+//      number of samples found in B.
+//
----------------
I think you might want to leave this more opaque.

When computing these, it seems likely that we'll want to (in some cases) perform normalization to keep the values of the weights from getting into an excessive range while preserving the relative weights among the exit edges of a block.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:52
@@ +51,3 @@
+
+// command line option for loading path profiles
+static cl::opt<std::string> AutoProfileFilename(
----------------
LLVM tries to be somewhat particular about comments being reasonably well formed prose, so capitalize and punctuate.

At the code level, you might also want to clarify that this is primarily a debugging interface.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:84-87
@@ +83,6 @@
+
+  virtual bool loadNative() = 0;
+  virtual bool loadText() = 0;
+  virtual void dump() = 0;
+  virtual bool emitAnnotations(Function &F) = 0;
+
----------------
I'm curious about the use of an abstract interface. Do you expect this interface to grow? How many subclasses are you envisioning? One per profiling source? One per file format? How many of those? Is there another reason to use a class as opposed to a more procedural style?

(sorry for rapid fire questions, just trying to get a good sense of where you're going here.)

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:125
@@ +124,3 @@
+///      be relative to the start of the function.
+class AutoProfileSampleBased : public AutoProfiler {
+public:
----------------
The naming / comments aren't really making sense to me yet.

The entire pass is about sample-based profiles, right? Is this class supposed to be Linux Perf specific, or reading a generic (custom) file format that we produce via a tool from Linux Perf data, or potentially any other data source? The comments make me think the former, but the naming makes me think the latter.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:186-187
@@ +185,4 @@
+/// profiles. It keeps track of line number information and location of
+/// the file pointer. Users of this class are responsible for actually
+/// parsing the lines returned by the readLine function.
+class AutoProfileTextLoader {
----------------
Why not sink the line parsing into this class? At least into a raw struct? That would (i think) give a nice separation of concerns from the textual parsing and acting on the results.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:219-220
@@ +218,4 @@
+
+  /// \brief Report a parse error message and stop compilation.
+  void reportParseError(std::string Msg) const {
+    report_fatal_error(Filename + ":" + Twine(Lineno) + ": " + Msg + "\n");
----------------
Accept a Twine argument here?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:232
@@ +231,3 @@
+  /// \brief Current line number.
+  size_t Lineno;
+
----------------
I would use int or int64_t, unsigned is just reducing our ability to check for bugs here.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:326-331
@@ +325,8 @@
+///
+/// Symbol table (represented with the string "symbol table")
+///    Number of symbols in the table
+///    symbol 1
+///    symbol 2
+///    ...
+///    symbol N
+///
----------------
I wonder, how essential was this? Could we get by with just putting the symbol name in the header for each function body profile snippet?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:369
@@ +368,3 @@
+    SmallVector<StringRef, 4> Matches;
+    Regex head_re("^([^:]+):([0-9]+):([0-9]+):([0-9]+)$");
+    Line = Loader.readLine().str();
----------------
Bulid the regular expression outside of the loop.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:334
@@ +333,3 @@
+/// Function body profiles
+///    function1:total_samples:total_head_samples:number_of_locations
+///    location_offset_1: number_of_samples
----------------
What are these numbers? Only 'total_samples' really makes sense to me.

Also, why not just compute these values on the fly while reading the per-location bit?

The reason I'm pushing for the text format not having the self-consistency invariants is that it will make incremental updates to unit tests significantly easier if I don't have to remember to go update the totals each time.

Since these can repeat anyways, and they're just for unittesting, what about this syntax:

  function_name:location_offset: number_of_samples

Then have no state what-so-ever, just each line adds a number of samples at a particular location offset from the named function. Repeating the function name may be wasteful, but for tests it seems harmless, and it would make parsing much simpler due to it being purely line-oriented.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:404
@@ +403,3 @@
+///
+/// The "weight" of an instruction @param I is the number of samples
+/// collected on that instruction at runtime. To retrieve it, we
----------------
I don't think the '\param' command (or @param as you've spelled it) works the way you think it works: http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdparam

I would also try to consistently use the '\command' spelling.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:465-469
@@ +464,7 @@
+    SmallVector<uint32_t, 4> Weights;
+    for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {
+      BasicBlock *Succ = TI->getSuccessor(I);
+      uint32_t Weight = computeBlockWeight(Succ, FirstLineno, BodySamples);
+      Weights.push_back(Weight);
+    }
+
----------------
This re-computes the block weight for each predecessor of a block which could be *very* large for certain CFGs.

I think you want to memoize in a map from BasicBlock* -> weight.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:473
@@ +472,3 @@
+    llvm::MDNode *WeightsNode = MDB.createBranchWeights(Weights);
+    TI->setMetadata(llvm::LLVMContext::MD_prof, WeightsNode);
+  }
----------------
What if there already was metadata, for example from __builtin_expect?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:493-504
@@ +492,14 @@
+  LLVMContext &context = FirstInst.getContext();
+  for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) {
+    Instruction &Inst = *I;
+    uint32_t InstWeight = getInstWeight(Inst, FirstLineno, BodySamples);
+    if (InstWeight > 0) {
+      SmallVector<Value *, 1> sample_values;
+      sample_values.push_back(ConstantInt::get(Type::getInt32Ty(context),
+                                               InstWeight));
+      MDNode *MD = MDNode::get(context, sample_values);
+      Inst.setMetadata(AutoProfileSamplesMDKind, MD);
+      Changed = true;
+    }
+  }
+
----------------
I think this is all dead code. You query the sample map directly in computeBranchWeights now, so there is no need to annotate instructions with metadata.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:487
@@ +486,3 @@
+  bool Changed = false;
+  StringRef name = F.getName();
+  FunctionProfile &FProfile = Profiles[name];
----------------
Some variable names in this fucntion are still named in the old styel.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:506-507
@@ +505,4 @@
+
+  if (Changed)
+    computeBranchWeights(F);
+
----------------
computeBranchWights should probably just be inlined after the above change. Then you should compute changed directly based on whether you add any metadata nodes.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:509
@@ +508,3 @@
+
+  DEBUG(if (Changed) dumpInstrumentedInsns(dbgs(), F););
+
----------------
Yowzers! There already is a -print-after=foo option to the 'opt' command that will dump the IR (including metadata)... I would reserve DEBUG stuff for more compact information, but keep the dump methods so that you can call them in gdb.

http://llvm-reviews.chandlerc.com/D2058