[PATCH] Initial instrumentation based PGO implementation
Bob Wilson
bob.wilson at apple.com
Mon Dec 16 14:14:23 PST 2013
LGTM.
On Dec 13, 2013, at 1:43 PM, Justin Bogner <mail at justinbogner.com> wrote:
> New patch attached. The counters are now only accessed via RegionCounter
> objects, there's some better documentation of PGOProfileData, PGO
> creates the RegionCounts vector instead of it being allocated inside
> PGOProfileData, and we check for absurdly large data files.
>
> I've reattached the command line flag and compiler-rt patches to keep
> everything in one place, but they're unchanged.
>
> <0001-Driver-Accept-fprofile-instr-use-and-fprofile-instr-.patch><0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch><0001-profile-Rudimentary-suppport-for-PGO-instrumentation.patch>
> Justin Bogner <mail at justinbogner.com> writes:
>> John McCall <rjmccall at apple.com> writes:
>>>>
>>> <0001-profile-Rudimentary-suppport-for-PGO-instrumentation.patch><0001-Driver-Accept-fprofile-instr-use-and-fprofile-instr-.patch><0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch>
>>>
>>> + unsigned Counter = PGO.getRegionCounter(&S);
>>> + RegionCounter Cnt(PGO, Counter);
>>>
>>> There are a number of places where you do this instead of
>>> getPGORegionCounter, maybe so that you can use Counter separately.
>>> Just use Cnt.getCounter() downstream instead. (You should have a
>>> getCounter() method on that type.)
>>
>> Good point. I think I can do one better than that and avoid needing
>> access to the counter indices completely. More on that below.
>>
>>> + unsigned LoopCnt
>>>
>>> There are a bunch of places you pass counts around as unsigned, and
>>> then a bunch of other times that you pass them as uint64_t. That’s
>>> problematic; so is the ease of passing other kinds of data
>>> accidentally as unsigned. The sky will be brighter and the day
>>> merrier if you make an opaque wrapper type for this. It’ll also make
>>> your rampant use of signal values much more self-documenting.
>>>
>>> The same goes for counter indexes, where the size problems are less
>>> but the risk of accidentally using a counter where you meant to use a
>>> count is much larger.
>>
>> The example here is actually a counter index, but there is one place
>> where an unsigned is used for a counter value (The DefaultCount in a
>> switch). The current set up makes it much too easy to mix up counter
>> indices and counts, especially with the similar names. This will be much
>> clearer if we hide the counter indices and work with the RegionCounter
>> type instead of unsigned. I think leaving the counts themselves as
>> uint64_t makes sense given that change.
>>
>> I'm not really sure what you're talking about with regard to signal
>> values. There are a couple of pointers that can be null, and the "three
>> counters for a loop" thing is a bit odd, but as far as counter indices
>> and counts go, every value is valid. Would you mind explaining?
>>
>>> +class PGOProfileData {
>>> +private:
>>> + llvm::OwningPtr<llvm::MemoryBuffer> DataBuffer;
>>>
>>> + llvm::StringMap<unsigned> DataOffsets;
>>>
>>> Please document that these are offsets into the buffer.
>>>
>>> It’s probably reasonable to assume that offsets fit within an
>>> unsigned, but you should at least protect against huge profile files.
>>
>> Will do.
>>
>>> +PGOProfileData::PGOProfileData(CodeGenModule &CGM, std::string Path)
>>>
>>> Why is this being taken as a std::string, and why is it being taken by value?
>>>
>>> + std::vector<uint64_t> *getFunctionCounts(StringRef MangledName);
>>>
>>> If you’re going to heap-allocate this (for some reason?), please at
>>> least use an owning pointer.
>>
>> Heap allocating this is silly and unnecessary. I don't know what I was
>> thinking. I'll fix that.
>>
>>> + /// Given a region counter for a loop, return the counter for a break
>>> + /// statement within the loop. The result is undefined for
>>> non-loop counters.
>>> + unsigned getBreakCounter(unsigned BaseCounter) { return BaseCounter + 1; }
>>>
>>> Interesting. You want a single counter associated with all the breaks
>>> in a loop?
>>
>> Yep. Breaks and continues get a single counter each, so that we can
>> differentiate them from other kinds of exits (return, goto). We can then
>> just look at the difference between entries to the loop body and exits,
>> and adjust using these counters to work out the counts in all the places
>> we're interested in. See Emit(While|Do|For|CXXForRange)Stmt for details.
>>
>>> + CodeGenPGO PGO;
>>> +
>>>
>>> Why are we not following the DebugInfo lead and making this something
>>> external that only gets created if PGO is actually enabled and data is
>>> present for this function? If you’re worried about allocation
>>> overhead, you can at least make it an Optional.
>>
>> I'm not really familiar with DebugInfo, but making this a nullable
>> pointer would force everything in CodeGen that deals with counters to
>> frequently check whether or not PGO is enabled, wouldn't it? I might be
>> missing something here.
>>
>> Thanks for the feedback!
>> -- Justin
More information about the cfe-commits
mailing list