[PATCH] Initial instrumentation based PGO implementation

Eric Christopher echristo at gmail.com
Mon Dec 16 14:33:55 PST 2013


What's the compile time impact of these patches when the option is and
isn't being used? It's looking a bit heavyweight and very invasive.

+  uint64_t LoopCount = Cnt.getCount();
+  uint64_t ExitCount =
+    Cnt.getParentCount() + Cnt.getContinueCounter().getCount() +
+    Cnt.getAdjustedCount() - Cnt.getCount();
+  CondBr->setMetadata(llvm::LLVMContext::MD_prof,
+                      PGO.createBranchWeights(LoopCount, ExitCount));
+
   EmitBranch(CondBlock);

You appear to cut and paste this a few times.

   // If the body of the case is just a 'break', and if there was no
fallthrough,
   // try to not emit an empty block.
-  if ((CGM.getCodeGenOpts().OptimizationLevel > 0) &&
+  if (!CGM.getCodeGenOpts().ProfileInstrGenerate &&
+      CGM.getCodeGenOpts().OptimizationLevel > 0 &&

You should probably update comments in cases like this (there are more of them).

+    // The default on our switch instruction will come to this block,
so we need
+    // to update its count.
+    (*SwitchWeights)[0] += ThisCount;

SwitchWeights is just a std::vector or SmallVector right? This seems a
little odd.

+      EmitBranchOnBoolExpr(CondBOp->getRHS(), TrueBlock, FalseBlock,
+                           TrueCount - FalseCount);

This sort of thing needs some documentation/comments in the code. It's
not clear what's going on from looking at the patch.

+    if (TrueCount != 0) {

if (TrueCount) ?

+  struct MapRegionCounters :
+    public ConstStmtVisitor<MapRegionCounters> {
+    unsigned NextCounter;
+    llvm::DenseMap<const Stmt*, unsigned> *CounterMap;
+    MapRegionCounters(llvm::DenseMap<const Stmt*, unsigned> *CounterMap) :
+      NextCounter(0), CounterMap(CounterMap) {
+    }
+    void VisitChildren(const Stmt *S) {
+      for (Stmt::const_child_range I = S->children(); I; ++I)
+        if (*I)
+         this->Visit(*I);
+    }

Document this whole anonymous namespace please.

+// RUN: %clangxx %s -o %t1 -fprofile-instr-generate
+// RUN: mkdir -p %tdir && cd %tdir && %t1

This is going to fail on any platform that doesn't have native codegen
enabled by default.

There's probably more review, just getting something started here.


-eric



-eric

On Fri, Dec 13, 2013 at 1:43 PM, Justin Bogner <mail at justinbogner.com> wrote:
> New patch attached. The counters are now only accessed via RegionCounter
> objects, there's some better documentation of PGOProfileData, PGO
> creates the RegionCounts vector instead of it being allocated inside
> PGOProfileData, and we check for absurdly large data files.
>
> I've reattached the command line flag and compiler-rt patches to keep
> everything in one place, but they're unchanged.
>
>
>
> Justin Bogner <mail at justinbogner.com> writes:
>> John McCall <rjmccall at apple.com> writes:
>>>>
>>> <0001-profile-Rudimentary-suppport-for-PGO-instrumentation.patch><0001-Driver-Accept-fprofile-instr-use-and-fprofile-instr-.patch><0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch>
>>>
>>> +  unsigned Counter = PGO.getRegionCounter(&S);
>>> +  RegionCounter Cnt(PGO, Counter);
>>>
>>> There are a number of places where you do this instead of
>>> getPGORegionCounter, maybe so that you can use Counter separately.
>>> Just use Cnt.getCounter() downstream instead.  (You should have a
>>> getCounter() method on that type.)
>>
>> Good point. I think I can do one better than that and avoid needing
>> access to the counter indices completely. More on that below.
>>
>>> + unsigned LoopCnt
>>>
>>> There are a bunch of places you pass counts around as unsigned, and
>>> then a bunch of other times that you pass them as uint64_t.  That’s
>>> problematic; so is the ease of passing other kinds of data
>>> accidentally as unsigned.  The sky will be brighter and the day
>>> merrier if you make an opaque wrapper type for this.  It’ll also make
>>> your rampant use of signal values much more self-documenting.
>>>
>>> The same goes for counter indexes, where the size problems are less
>>> but the risk of accidentally using a counter where you meant to use a
>>> count is much larger.
>>
>> The example here is actually a counter index, but there is one place
>> where an unsigned is used for a counter value (The DefaultCount in a
>> switch). The current set up makes it much too easy to mix up counter
>> indices and counts, especially with the similar names. This will be much
>> clearer if we hide the counter indices and work with the RegionCounter
>> type instead of unsigned. I think leaving the counts themselves as
>> uint64_t makes sense given that change.
>>
>> I'm not really sure what you're talking about with regard to signal
>> values. There are a couple of pointers that can be null, and the "three
>> counters for a loop" thing is a bit odd, but as far as counter indices
>> and counts go, every value is valid. Would you mind explaining?
>>
>>> +class PGOProfileData {
>>> +private:
>>> +  llvm::OwningPtr<llvm::MemoryBuffer> DataBuffer;
>>>
>>> +  llvm::StringMap<unsigned> DataOffsets;
>>>
>>> Please document that these are offsets into the buffer.
>>>
>>> It’s probably reasonable to assume that offsets fit within an
>>> unsigned, but you should at least protect against huge profile files.
>>
>> Will do.
>>
>>> +PGOProfileData::PGOProfileData(CodeGenModule &CGM, std::string Path)
>>>
>>> Why is this being taken as a std::string, and why is it being taken by value?
>>>
>>> +  std::vector<uint64_t> *getFunctionCounts(StringRef MangledName);
>>>
>>> If you’re going to heap-allocate this (for some reason?), please at
>>> least use an owning pointer.
>>
>> Heap allocating this is silly and unnecessary. I don't know what I was
>> thinking. I'll fix that.
>>
>>> +  /// Given a region counter for a loop, return the counter for a break
>>> + /// statement within the loop. The result is undefined for
>>> non-loop counters.
>>> +  unsigned getBreakCounter(unsigned BaseCounter) { return BaseCounter + 1; }
>>>
>>> Interesting.  You want a single counter associated with all the breaks
>>> in a loop?
>>
>> Yep. Breaks and continues get a single counter each, so that we can
>> differentiate them from other kinds of exits (return, goto). We can then
>> just look at the difference between entries to the loop body and exits,
>> and adjust using these counters to work out the counts in all the places
>> we're interested in. See Emit(While|Do|For|CXXForRange)Stmt for details.
>>
>>> +  CodeGenPGO PGO;
>>> +
>>>
>>> Why are we not following the DebugInfo lead and making this something
>>> external that only gets created if PGO is actually enabled and data is
>>> present for this function?  If you’re worried about allocation
>>> overhead, you can at least make it an Optional.
>>
>> I'm not really familiar with DebugInfo, but making this a nullable
>> pointer would force everything in CodeGen that deals with counters to
>> frequently check whether or not PGO is enabled, wouldn't it? I might be
>> missing something here.
>>
>> Thanks for the feedback!
>> -- Justin
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>




More information about the cfe-commits mailing list