[PATCH] Initial instrumentation based PGO implementation
Eric Christopher
echristo at gmail.com
Mon Dec 16 14:33:55 PST 2013
What's the compile time impact of these patches when the option is and
isn't being used? It's looking a bit heavyweight and very invasive.
+ uint64_t LoopCount = Cnt.getCount();
+ uint64_t ExitCount =
+ Cnt.getParentCount() + Cnt.getContinueCounter().getCount() +
+ Cnt.getAdjustedCount() - Cnt.getCount();
+ CondBr->setMetadata(llvm::LLVMContext::MD_prof,
+ PGO.createBranchWeights(LoopCount, ExitCount));
+
EmitBranch(CondBlock);
You appear to cut and paste this a few times.
// If the body of the case is just a 'break', and if there was no
fallthrough,
// try to not emit an empty block.
- if ((CGM.getCodeGenOpts().OptimizationLevel > 0) &&
+ if (!CGM.getCodeGenOpts().ProfileInstrGenerate &&
+ CGM.getCodeGenOpts().OptimizationLevel > 0 &&
You should probably update comments in cases like this (there are more of them).
+ // The default on our switch instruction will come to this block,
so we need
+ // to update its count.
+ (*SwitchWeights)[0] += ThisCount;
SwitchWeights is just a std::vector or SmallVector right? This seems a
little odd.
+ EmitBranchOnBoolExpr(CondBOp->getRHS(), TrueBlock, FalseBlock,
+ TrueCount - FalseCount);
This sort of thing needs some documentation/comments in the code. It's
not clear what's going on from looking at the patch.
+ if (TrueCount != 0) {
if (TrueCount) ?
+ struct MapRegionCounters :
+ public ConstStmtVisitor<MapRegionCounters> {
+ unsigned NextCounter;
+ llvm::DenseMap<const Stmt*, unsigned> *CounterMap;
+ MapRegionCounters(llvm::DenseMap<const Stmt*, unsigned> *CounterMap) :
+ NextCounter(0), CounterMap(CounterMap) {
+ }
+ void VisitChildren(const Stmt *S) {
+ for (Stmt::const_child_range I = S->children(); I; ++I)
+ if (*I)
+ this->Visit(*I);
+ }
Document this whole anonymous namespace please.
+// RUN: %clangxx %s -o %t1 -fprofile-instr-generate
+// RUN: mkdir -p %tdir && cd %tdir && %t1
This is going to fail on any platform that doesn't have native codegen
enabled by default.
There's probably more review, just getting something started here.
-eric
-eric
On Fri, Dec 13, 2013 at 1:43 PM, Justin Bogner <mail at justinbogner.com> wrote:
> New patch attached. The counters are now only accessed via RegionCounter
> objects, there's some better documentation of PGOProfileData, PGO
> creates the RegionCounts vector instead of it being allocated inside
> PGOProfileData, and we check for absurdly large data files.
>
> I've reattached the command line flag and compiler-rt patches to keep
> everything in one place, but they're unchanged.
>
>
>
> Justin Bogner <mail at justinbogner.com> writes:
>> John McCall <rjmccall at apple.com> writes:
>>>>
>>> <0001-profile-Rudimentary-suppport-for-PGO-instrumentation.patch><0001-Driver-Accept-fprofile-instr-use-and-fprofile-instr-.patch><0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch>
>>>
>>> + unsigned Counter = PGO.getRegionCounter(&S);
>>> + RegionCounter Cnt(PGO, Counter);
>>>
>>> There are a number of places where you do this instead of
>>> getPGORegionCounter, maybe so that you can use Counter separately.
>>> Just use Cnt.getCounter() downstream instead. (You should have a
>>> getCounter() method on that type.)
>>
>> Good point. I think I can do one better than that and avoid needing
>> access to the counter indices completely. More on that below.
>>
>>> + unsigned LoopCnt
>>>
>>> There are a bunch of places you pass counts around as unsigned, and
>>> then a bunch of other times that you pass them as uint64_t. That’s
>>> problematic; so is the ease of passing other kinds of data
>>> accidentally as unsigned. The sky will be brighter and the day
>>> merrier if you make an opaque wrapper type for this. It’ll also make
>>> your rampant use of signal values much more self-documenting.
>>>
>>> The same goes for counter indexes, where the size problems are less
>>> but the risk of accidentally using a counter where you meant to use a
>>> count is much larger.
>>
>> The example here is actually a counter index, but there is one place
>> where an unsigned is used for a counter value (The DefaultCount in a
>> switch). The current set up makes it much too easy to mix up counter
>> indices and counts, especially with the similar names. This will be much
>> clearer if we hide the counter indices and work with the RegionCounter
>> type instead of unsigned. I think leaving the counts themselves as
>> uint64_t makes sense given that change.
>>
>> I'm not really sure what you're talking about with regard to signal
>> values. There are a couple of pointers that can be null, and the "three
>> counters for a loop" thing is a bit odd, but as far as counter indices
>> and counts go, every value is valid. Would you mind explaining?
>>
>>> +class PGOProfileData {
>>> +private:
>>> + llvm::OwningPtr<llvm::MemoryBuffer> DataBuffer;
>>>
>>> + llvm::StringMap<unsigned> DataOffsets;
>>>
>>> Please document that these are offsets into the buffer.
>>>
>>> It’s probably reasonable to assume that offsets fit within an
>>> unsigned, but you should at least protect against huge profile files.
>>
>> Will do.
>>
>>> +PGOProfileData::PGOProfileData(CodeGenModule &CGM, std::string Path)
>>>
>>> Why is this being taken as a std::string, and why is it being taken by value?
>>>
>>> + std::vector<uint64_t> *getFunctionCounts(StringRef MangledName);
>>>
>>> If you’re going to heap-allocate this (for some reason?), please at
>>> least use an owning pointer.
>>
>> Heap allocating this is silly and unnecessary. I don't know what I was
>> thinking. I'll fix that.
>>
>>> + /// Given a region counter for a loop, return the counter for a break
>>> + /// statement within the loop. The result is undefined for
>>> non-loop counters.
>>> + unsigned getBreakCounter(unsigned BaseCounter) { return BaseCounter + 1; }
>>>
>>> Interesting. You want a single counter associated with all the breaks
>>> in a loop?
>>
>> Yep. Breaks and continues get a single counter each, so that we can
>> differentiate them from other kinds of exits (return, goto). We can then
>> just look at the difference between entries to the loop body and exits,
>> and adjust using these counters to work out the counts in all the places
>> we're interested in. See Emit(While|Do|For|CXXForRange)Stmt for details.
>>
>>> + CodeGenPGO PGO;
>>> +
>>>
>>> Why are we not following the DebugInfo lead and making this something
>>> external that only gets created if PGO is actually enabled and data is
>>> present for this function? If you’re worried about allocation
>>> overhead, you can at least make it an Optional.
>>
>> I'm not really familiar with DebugInfo, but making this a nullable
>> pointer would force everything in CodeGen that deals with counters to
>> frequently check whether or not PGO is enabled, wouldn't it? I might be
>> missing something here.
>>
>> Thanks for the feedback!
>> -- Justin
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
More information about the cfe-commits
mailing list