[PATCH] Use auto profiler to set branch weights

Thu Oct 31 06:51:22 PDT 2013

  Thanks for the review. I've uploaded a revised patch with most of the changes you suggested. The version you reviewed was slightly out of date (I had uploaded a new version with the pass name changed).

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:14-18
@@ +13,7 @@
+//
+// This pass generates two types of IR annotations:
+//
+// - autoprofile.samples: Indicates how many samples were collected by
+//      the external profiler during the execution of the program. This
+//      is an unsigned integer value.
+//
----------------
Chandler Carruth wrote:
> To what IR does this metadata attach to? What is its purpose?
I've clarified the description here.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:20-24
@@ +19,7 @@
+//
+// - prof: Represents branch weights. This annotation is added to branches
+//      to indicate the weights of each edge coming out of the branch.
+//      The weight of each edge is the weight of the target block for
+//      that edge. The weight of a block B is computed as the maximum
+//      number of samples found in B.
+//
----------------
Chandler Carruth wrote:
> I think you might want to leave this more opaque.
> 
> When computing these, it seems likely that we'll want to (in some cases) perform normalization to keep the values of the weights from getting into an excessive range while preserving the relative weights among the exit edges of a block.
Really? It was my impression that the analyzer will normalize these values (BranchProbabilityInfo::calcMetadataWeights).  The weights are normalized to [1, getMaxWeightFor(BB)].

When I saw that, I figured I could put just about any value here.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:52
@@ +51,3 @@
+
+// command line option for loading path profiles
+static cl::opt<std::string> AutoProfileFilename(
----------------
Chandler Carruth wrote:
> LLVM tries to be somewhat particular about comments being reasonably well formed prose, so capitalize and punctuate.
> 
> At the code level, you might also want to clarify that this is primarily a debugging interface.
Ironically, this is a left-over comment from the old path profiler that I never edited. Fixed.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:84-87
@@ +83,6 @@
+
+  virtual bool loadNative() = 0;
+  virtual bool loadText() = 0;
+  virtual void dump() = 0;
+  virtual bool emitAnnotations(Function &F) = 0;
+
----------------
Chandler Carruth wrote:
> I'm curious about the use of an abstract interface. Do you expect this interface to grow? How many subclasses are you envisioning? One per profiling source? One per file format? How many of those? Is there another reason to use a class as opposed to a more procedural style?
> 
> (sorry for rapid fire questions, just trying to get a good sense of where you're going here.)
My thinking here is that we will have:

- A handful of profiling sources, but not more than a handful (say instrumentation, data tracers, cache profilers)
- Each profile source will have a "native" format (using bitcode) and (optionally) a text format for debugging.

The class format is not really necessary, but it may make it easier for the different profilers to share common functionality, like generation of attributes, and profile parsing.

I'm OK with generalizing later, if you prefer a simpler initial implementation.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:125
@@ +124,3 @@
+///      be relative to the start of the function.
+class AutoProfileSampleBased : public AutoProfiler {
+public:
----------------
Chandler Carruth wrote:
> The naming / comments aren't really making sense to me yet.
> 
> The entire pass is about sample-based profiles, right? Is this class supposed to be Linux Perf specific, or reading a generic (custom) file format that we produce via a tool from Linux Perf data, or potentially any other data source? The comments make me think the former, but the naming makes me think the latter.
The class is not Linux Perf specific. You can actually generate the profile manually or via any other sampler that generates that file format.  I adjusted the comments.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:186-187
@@ +185,4 @@
+/// profiles. It keeps track of line number information and location of
+/// the file pointer. Users of this class are responsible for actually
+/// parsing the lines returned by the readLine function.
+class AutoProfileTextLoader {
----------------
Chandler Carruth wrote:
> Why not sink the line parsing into this class? At least into a raw struct? That would (i think) give a nice separation of concerns from the textual parsing and acting on the results.
That would imply that every profiler wanting to use a text file must use this format. This was meant to be a generic text file reader helper. Your suggestion bundles up both behaviours.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:219-220
@@ +218,4 @@
+
+  /// \brief Report a parse error message and stop compilation.
+  void reportParseError(std::string Msg) const {
+    report_fatal_error(Filename + ":" + Twine(Lineno) + ": " + Msg + "\n");
----------------
Chandler Carruth wrote:
> Accept a Twine argument here?
Done.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:232
@@ +231,3 @@
+  /// \brief Current line number.
+  size_t Lineno;
+
----------------
Chandler Carruth wrote:
> I would use int or int64_t, unsigned is just reducing our ability to check for bugs here.
Done.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:404
@@ +403,3 @@
+///
+/// The "weight" of an instruction @param I is the number of samples
+/// collected on that instruction at runtime. To retrieve it, we
----------------
Chandler Carruth wrote:
> I don't think the '\param' command (or @param as you've spelled it) works the way you think it works: http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdparam
> 
> I would also try to consistently use the '\command' spelling.
Hm, quite right.  I used @param because I saw it all over the source tree.  Changed to \param.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:326-331
@@ +325,8 @@
+///
+/// Symbol table (represented with the string "symbol table")
+///    Number of symbols in the table
+///    symbol 1
+///    symbol 2
+///    ...
+///    symbol N
+///
----------------
Chandler Carruth wrote:
> I wonder, how essential was this? Could we get by with just putting the symbol name in the header for each function body profile snippet?
Not altogether essential, I suppose. It just makes parsing a bit simpler as it allows initializing the sample table with all the expected functions.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:334
@@ +333,3 @@
+/// Function body profiles
+///    function1:total_samples:total_head_samples:number_of_locations
+///    location_offset_1: number_of_samples
----------------
Chandler Carruth wrote:
> What are these numbers? Only 'total_samples' really makes sense to me.
> 
> Also, why not just compute these values on the fly while reading the per-location bit?
> 
> The reason I'm pushing for the text format not having the self-consistency invariants is that it will make incremental updates to unit tests significantly easier if I don't have to remember to go update the totals each time.
> 
> 
> Since these can repeat anyways, and they're just for unittesting, what about this syntax:
> 
>   function_name:location_offset: number_of_samples
> 
> Then have no state what-so-ever, just each line adds a number of samples at a particular location offset from the named function. Repeating the function name may be wasteful, but for tests it seems harmless, and it would make parsing much simpler due to it being purely line-oriented.
The total number of samples on a function will not necessarily match the samples collected on each line in that function.  It does happen that the profiler collects, say, 10,000 samples on a function, but the samples on the individual lines do not add up to 10,000.

The total number of head samples is currently not used and can disappear.

There are no invariants to maintain, other than the number of sampled lines that the parser should expect to find for each function.

Would you mind if I do this as a followup patch?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:369
@@ +368,3 @@
+    SmallVector<StringRef, 4> Matches;
+    Regex head_re("^([^:]+):([0-9]+):([0-9]+):([0-9]+)$");
+    Line = Loader.readLine().str();
----------------
Chandler Carruth wrote:
> Bulid the regular expression outside of the loop.
Done. I also lifted the innermost Regex for line samples.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:487
@@ +486,3 @@
+  bool Changed = false;
+  StringRef name = F.getName();
+  FunctionProfile &FProfile = Profiles[name];
----------------
Chandler Carruth wrote:
> Some variable names in this fucntion are still named in the old styel.
Done.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:493-504
@@ +492,14 @@
+  LLVMContext &context = FirstInst.getContext();
+  for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) {
+    Instruction &Inst = *I;
+    uint32_t InstWeight = getInstWeight(Inst, FirstLineno, BodySamples);
+    if (InstWeight > 0) {
+      SmallVector<Value *, 1> sample_values;
+      sample_values.push_back(ConstantInt::get(Type::getInt32Ty(context),
+                                               InstWeight));
+      MDNode *MD = MDNode::get(context, sample_values);
+      Inst.setMetadata(AutoProfileSamplesMDKind, MD);
+      Changed = true;
+    }
+  }
+
----------------
Chandler Carruth wrote:
> I think this is all dead code. You query the sample map directly in computeBranchWeights now, so there is no need to annotate instructions with metadata.
I was planning to use them in class BlockFrequencyImpl in the next couple of patches. If I take this out, I'd have to take out everything related to the sampleprofile.samples annotation. I can do one of two things:

- Leave them in and add the changes to BlockFrequencyImpl now.
- Take everything out and re-add them when I change BlockFrequencyImpl.

I'm fine with either option.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:509
@@ +508,3 @@
+
+  DEBUG(if (Changed) dumpInstrumentedInsns(dbgs(), F););
+
----------------
Chandler Carruth wrote:
> Yowzers! There already is a -print-after=foo option to the 'opt' command that will dump the IR (including metadata)... I would reserve DEBUG stuff for more compact information, but keep the dump methods so that you can call them in gdb.
Right, but -print-after is much more verbose than what I was going for here. This only prints the instrumented insns.

If folks are more used to -print-after, I'll take it out.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:473
@@ +472,3 @@
+    llvm::MDNode *WeightsNode = MDB.createBranchWeights(Weights);
+    TI->setMetadata(llvm::LLVMContext::MD_prof, WeightsNode);
+  }
----------------
Chandler Carruth wrote:
> What if there already was metadata, for example from __builtin_expect?
So, two schools of thought here:

# The user lied to us when they added __builtin_expect. So, overriding that decision is OK.
# The profile being used is not representative.

In either case, emitting a branch weight in opposition to what __builtin_expect says will not affect correctness, but it may affect performance if #2 is happening.

We could either warn and not override or warn and override. I don't think I would want to silently ignore the __builtin_expect.

Future patch?

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:465-469
@@ +464,7 @@
+    SmallVector<uint32_t, 4> Weights;
+    for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {
+      BasicBlock *Succ = TI->getSuccessor(I);
+      uint32_t Weight = computeBlockWeight(Succ, FirstLineno, BodySamples);
+      Weights.push_back(Weight);
+    }
+
----------------
Chandler Carruth wrote:
> This re-computes the block weight for each predecessor of a block which could be *very* large for certain CFGs.
> 
> I think you want to memoize in a map from BasicBlock* -> weight.
Done.

http://llvm-reviews.chandlerc.com/D2058