[PATCH] Use auto profiler to set branch weights

Mon Nov 4 13:18:53 PST 2013

  Replies to comments on the old files...

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:20-24
@@ +19,7 @@
+//
+// - prof: Represents branch weights. This annotation is added to branches
+//      to indicate the weights of each edge coming out of the branch.
+//      The weight of each edge is the weight of the target block for
+//      that edge. The weight of a block B is computed as the maximum
+//      number of samples found in B.
+//
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > I think you might want to leave this more opaque.
> > 
> > When computing these, it seems likely that we'll want to (in some cases) perform normalization to keep the values of the weights from getting into an excessive range while preserving the relative weights among the exit edges of a block.
> Really? It was my impression that the analyzer will normalize these values (BranchProbabilityInfo::calcMetadataWeights).  The weights are normalized to [1, getMaxWeightFor(BB)].
> 
> When I saw that, I figured I could put just about any value here.
Yea, final normalization will happen there. I just worry about exceeding the max you *can* put on a node... Not really critical for this iteration.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:84-87
@@ +83,6 @@
+
+  virtual bool loadNative() = 0;
+  virtual bool loadText() = 0;
+  virtual void dump() = 0;
+  virtual bool emitAnnotations(Function &F) = 0;
+
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > I'm curious about the use of an abstract interface. Do you expect this interface to grow? How many subclasses are you envisioning? One per profiling source? One per file format? How many of those? Is there another reason to use a class as opposed to a more procedural style?
> > 
> > (sorry for rapid fire questions, just trying to get a good sense of where you're going here.)
> My thinking here is that we will have:
> 
> - A handful of profiling sources, but not more than a handful (say instrumentation, data tracers, cache profilers)
> - Each profile source will have a "native" format (using bitcode) and (optionally) a text format for debugging.
> 
> The class format is not really necessary, but it may make it easier for the different profilers to share common functionality, like generation of attributes, and profile parsing.
> 
> I'm OK with generalizing later, if you prefer a simpler initial implementation.
I think I would generalize later. I see your points for why we will probably need something, but currently I'm not sure what that will actually look like. My hope is it will be more clear when we have the other source.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:186-187
@@ +185,4 @@
+/// profiles. It keeps track of line number information and location of
+/// the file pointer. Users of this class are responsible for actually
+/// parsing the lines returned by the readLine function.
+class AutoProfileTextLoader {
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > Why not sink the line parsing into this class? At least into a raw struct? That would (i think) give a nice separation of concerns from the textual parsing and acting on the results.
> That would imply that every profiler wanting to use a text file must use this format. This was meant to be a generic text file reader helper. Your suggestion bundles up both behaviours.
If you want a line-oriented text file reader, it should go into the Support library in LLVM rather than here, and it should probably look different (specifically, it should expose line iterators so that we can use normal C++ idioms with them).

I'm OK with a FIXME that says to clean this up and move it into the support library. I would at least name it something that makes it clear this is a totally generic text file reader.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:334
@@ +333,3 @@
+/// Function body profiles
+///    function1:total_samples:total_head_samples:number_of_locations
+///    location_offset_1: number_of_samples
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > What are these numbers? Only 'total_samples' really makes sense to me.
> > 
> > Also, why not just compute these values on the fly while reading the per-location bit?
> > 
> > The reason I'm pushing for the text format not having the self-consistency invariants is that it will make incremental updates to unit tests significantly easier if I don't have to remember to go update the totals each time.
> > 
> > 
> > Since these can repeat anyways, and they're just for unittesting, what about this syntax:
> > 
> >   function_name:location_offset: number_of_samples
> > 
> > Then have no state what-so-ever, just each line adds a number of samples at a particular location offset from the named function. Repeating the function name may be wasteful, but for tests it seems harmless, and it would make parsing much simpler due to it being purely line-oriented.
> The total number of samples on a function will not necessarily match the samples collected on each line in that function.  It does happen that the profiler collects, say, 10,000 samples on a function, but the samples on the individual lines do not add up to 10,000.
> 
> The total number of head samples is currently not used and can disappear.
> 
> There are no invariants to maintain, other than the number of sampled lines that the parser should expect to find for each function.
> 
> Would you mind if I do this as a followup patch?
I'm definitely OK with doing it later, but I do think we should make the text format stateless and line-oriented to ease test case maintenance. I'm not really worried about growing the tables as we read the file in, etc.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:473
@@ +472,3 @@
+    llvm::MDNode *WeightsNode = MDB.createBranchWeights(Weights);
+    TI->setMetadata(llvm::LLVMContext::MD_prof, WeightsNode);
+  }
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > What if there already was metadata, for example from __builtin_expect?
> So, two schools of thought here:
> 
> # The user lied to us when they added __builtin_expect. So, overriding that decision is OK.
> # The profile being used is not representative.
> 
> In either case, emitting a branch weight in opposition to what __builtin_expect says will not affect correctness, but it may affect performance if #2 is happening.
> 
> We could either warn and not override or warn and override. I don't think I would want to silently ignore the __builtin_expect.
> 
> Future patch?
Definitely future patch.

We don't have any good way to warn within this context. That would need to be a separate tool.

I'm in favor of trusting __builtin_expect over the measured profile, and producing user-facing tools that let them explore such conflicts and remove misguided __builtin_expect calls from their code. Essentially, in the event of a conflict, if the measured profile is wrong there is no recourse unless we trust the __builtin_expect. If the __builtin_expect is wrong, there is always the option of removing it.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:493-504
@@ +492,14 @@
+  LLVMContext &context = FirstInst.getContext();
+  for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) {
+    Instruction &Inst = *I;
+    uint32_t InstWeight = getInstWeight(Inst, FirstLineno, BodySamples);
+    if (InstWeight > 0) {
+      SmallVector<Value *, 1> sample_values;
+      sample_values.push_back(ConstantInt::get(Type::getInt32Ty(context),
+                                               InstWeight));
+      MDNode *MD = MDNode::get(context, sample_values);
+      Inst.setMetadata(AutoProfileSamplesMDKind, MD);
+      Changed = true;
+    }
+  }
+
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > I think this is all dead code. You query the sample map directly in computeBranchWeights now, so there is no need to annotate instructions with metadata.
> I was planning to use them in class BlockFrequencyImpl in the next couple of patches. If I take this out, I'd have to take out everything related to the sampleprofile.samples annotation. I can do one of two things:
> 
> - Leave them in and add the changes to BlockFrequencyImpl now.
> - Take everything out and re-add them when I change BlockFrequencyImpl.
> 
> I'm fine with either option.
BlockFrequencyInfo is built up purely from BranchProbabilityInfo. I don't think we want to build a back channel there.

Fundamentally, when we start with block frequencies (such as from samples) we should be able to construct branch probabilities which precisely reproduce those block frequencies. The reason why the IR uses branch probabilities as the canonical (metadata) representation is because of this fact, and the fact that the reverse isn't true.

So, essentially, I don't think you'll want to (or need to) modify BlockFrequencyInfo.

================
Comment at: lib/Transforms/Scalar/AutoProfile.cpp:509
@@ +508,3 @@
+
+  DEBUG(if (Changed) dumpInstrumentedInsns(dbgs(), F););
+
----------------
Diego Novillo wrote:
> Chandler Carruth wrote:
> > Yowzers! There already is a -print-after=foo option to the 'opt' command that will dump the IR (including metadata)... I would reserve DEBUG stuff for more compact information, but keep the dump methods so that you can call them in gdb.
> Right, but -print-after is much more verbose than what I was going for here. This only prints the instrumented insns.
> 
> If folks are more used to -print-after, I'll take it out.
It's DEBUG stuff, so if this is what helps you debug, go for it. =]

http://llvm-reviews.chandlerc.com/D2058