[PATCH] Initial instrumentation based PGO implementation

Justin Bogner mail at justinbogner.com
Wed Dec 11 11:31:30 PST 2013


Bob Wilson <bob.wilson at apple.com> writes:
> On Dec 1, 2013, at 8:15 PM, Justin Bogner <mail at justinbogner.com> wrote:
>> Attached are the initial patches for instrumentation based PGO. The
>> first two are for clang, and the third for compiler-rt.
>
> I’ve got a few minor comments below.  The only serious one is that I don’t
> think your handling of switch case-ranges and defaults is correct.  Otherwise,
> it looks good.  Since this is a pretty big change, I’d like John to give his
> approval before you commit, since he is the code owner for this part of clang.

New patches attached. I believe I've addressed all of your issues, and
I've also fixed up a couple of minor things I ran into when testing
these further changes.

> Once the dust settles on this, we’re going to need to add some good
> documentation.  It really isn’t obvious how this all fits together.  I like
> the way you’ve revised the loop instrumentation to move some of the counters
> to less frequently executed code.  That will help make the instrumented code
> run faster.  I think we should investigate taking it one step further, but
> that can wait for a follow-up patch.  Specifically, I think we can avoid
> adding extra counters for breaks and continues in loops, since we should be
> able to compute the counts for those without extra instrumentation overhead.
>  We can discuss that later.
>
>     diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/
>     Options.td
>     index 9e7dc78..02d1087 100644
>     --- a/include/clang/Driver/Options.td
>     +++ b/include/clang/Driver/Options.td
>     @@ -369,6 +369,13 @@ def fno_autolink : Flag <["-"], "fno-autolink">,
>     Group<f_Group>,
>      def fprofile_sample_use_EQ : Joined<["-"], "fprofile-sample-use=">,
>          Group<f_Group>, Flags<[DriverOption, CC1Option]>,
>          HelpText<"Enable sample-based profile guided optimizations">;
>     +def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
>     +    Group<f_Group>, Flags<[CC1Option]>,
>     +    HelpText<"Instrument regions in each function to collect execution
>     counts”>;
>
> I know this comes from my initial patch, but I would prefer to drop the
> description of this as “instrumenting regions”.  That’s more of an internal
> implementation detail. How about just: “Generate instrumented code to collect
> execution counts”?

Done.

>     +def fprofile_instr_use : Flag<["-"], "fprofile-instr-use">, Group
>     <f_Group>;
>     +def fprofile_instr_use_EQ : Joined<["-"], "fprofile-instr-use=">,
>     +    Group<f_Group>, Flags<[CC1Option]>,
>     +    HelpText<"Use region instrumentation data for profile-guided
>     optimization”>;
>
> Likewise.  How about just dropping the word “region” from the description?

Done.

>      def fblocks : Flag<["-"], "fblocks">, Group<f_Group>, Flags<[CC1Option]>,
>        HelpText<"Enable the 'blocks' language feature">;
>
>     diff --git a/include/clang/Frontend/CodeGenOptions.def b/include/clang/
>     Frontend/CodeGenOptions.def
>     index 78b825d..154bf44 100644
>     --- a/include/clang/Frontend/CodeGenOptions.def
>     +++ b/include/clang/Frontend/CodeGenOptions.def
>     @@ -87,6 +87,9 @@ CODEGENOPT(OmitLeafFramePointer , 1, 0) ///< Set when
>     -momit-leaf-frame-pointer
>      VALUE_CODEGENOPT(OptimizationLevel, 3, 0) ///< The -O[0-4] option
>     specified.
>      VALUE_CODEGENOPT(OptimizeSize, 2, 0) ///< If -Os (==1) or -Oz (==2) is
>     specified.
>    
>     +CODEGENOPT(ProfileInstrRegions , 1, 0) ///< Instrument regions with
>     execution
>     +                                       ///< counts to use with PGO.
>     +
>
> How about renaming to “ProfileInstrGenerate” to match the option?

Makes sense.

>        /// If -fpcc-struct-return or -freg-struct-return is specified.
>      ENUM_CODEGENOPT(StructReturnConvention, StructReturnConventionKind, 2,
>     SRCK_Default)
>
> ...
>
>     diff --git a/lib/CodeGen/CGStmt.cpp b/lib/CodeGen/CGStmt.cpp
>
>     index 8325ddd..16d10a6 100644
>     --- a/lib/CodeGen/CGStmt.cpp
>     +++ b/lib/CodeGen/CGStmt.cpp
>
> ...
>
>     @@ -573,16 +601,20 @@ void CodeGenFunction::EmitDoStmt(const DoStmt &S) {
>        JumpDest LoopExit = getJumpDestInCurrentScope("do.end");
>        JumpDest LoopCond = getJumpDestInCurrentScope("do.cond");
>    
>     +  unsigned Counter = PGO.getRegionCounter(&S);
>     +  RegionCounter Cnt(PGO, Counter);
>     +
>        // Store the blocks to use for break and continue.
>     -  BreakContinueStack.push_back(BreakContinue(LoopExit, LoopCond));
>     +  BreakContinueStack.push_back(BreakContinue(LoopExit, LoopCond,
>     Counter));
>    
>     -  // Emit the body of the loop.
>
> Why delete the comment?

That was an accident, I've put it back.

>        llvm::BasicBlock *LoopBody = createBasicBlock("do.body");
>        EmitBlock(LoopBody);
>     +  Cnt.beginRegion(Builder);
>        {
>          RunCleanupsScope BodyScope(*this);
>          EmitStmt(S.getBody());
>        }
>     +  Cnt.adjustFallThroughCount();
>    
>        BreakContinueStack.pop_back();
>
> ...
>
>     @@ -923,7 +1004,10 @@ void CodeGenFunction::EmitCaseStmtRange(const
>     CaseStmt &S) {
>        // FIXME: parameters such as this should not be hardcoded.
>        if (Range.ult(llvm::APInt(Range.getBitWidth(), 64))) {
>          // Range is small enough to add multiple switch instruction cases.
>     +    uint64_t Total = CaseCnt.getCount() - CaseCnt.getParentCount();
>          for (unsigned i = 0, e = Range.getZExtValue() + 1; i != e; ++i) {
>     +      if (PGO.haveRegionCounts())
>     +        SwitchWeights->push_back(Total / e);
>            SwitchInsn->addCase(Builder.getInt(LHS), CaseDest);
>            LHS++;
>          }
>
> You dropped the code that I had here to make sure that the sum of the weights
> adds up to the original count.  I’m not absolutely certain that is necessary,
> but it seems like you could get some bad results otherwise due to rounding
> errors.  Are you sure it doesn’t matter?

It made more of a difference when the counters were on the edges instead
of the blocks, and as it is now I can't see it making a noticeable
difference unless the number of cases and the number of execution counts
are very near to each other. Even so, this does happen, and if someone
is looking at the counters and they don't add up they could get
confused.

I've added some logic to make sure all of the counts are accounted for.

>     @@ -948,7 +1032,10 @@ void CodeGenFunction::EmitCaseStmtRange(const
>     CaseStmt &S) {
>          Builder.CreateSub(SwitchInsn->getCondition(), Builder.getInt(LHS));
>        llvm::Value *Cond =
>          Builder.CreateICmpULE(Diff, Builder.getInt(Range), "inbounds");
>     -  Builder.CreateCondBr(Cond, CaseDest, FalseDest);
>     +  uint64_t ThisCount = CaseCnt.getCount() - CaseCnt.getParentCount();
>     +  uint64_t ElseCount = SwitchEntryCount - ThisCount;
>
> Maybe I’m misunderstanding, but this looks wrong to me.  This branch is on the
> chain of range checks that ends in the default.  You can’t find the “else”
> count that way.

You're right, this was completely wrong. I've added test cases for these
situations and changed how the default counter works so we can do this
similarly to your original patch.

This makes handling default a little bit different than handling other
case labels, but it has to be handled separately anyway, so I think
that's okay.

>     +  Builder.CreateCondBr(Cond, CaseDest, FalseDest,
>     +                       PGO.createBranchWeights(ThisCount, ElseCount));
>    
>        // Restore the appropriate insertion point.
>        if (RestoreBB)
>
> ...
>
>     @@ -1031,6 +1131,13 @@ void CodeGenFunction::EmitDefaultStmt(const
>     DefaultStmt &S) {
>        assert(DefaultBlock->empty() &&
>               "EmitDefaultStmt: Default block already defined?");
>        EmitBlock(DefaultBlock);
>     +
>     +  RegionCounter Cnt = getPGORegionCounter(&S);
>     +  // We already put a dummy weight in for the default case when we
>     created the
>     +  // switch, so we just need to fill it in.
>     +  if (PGO.haveRegionCounts())
>     +    (*SwitchWeights)[0] = Cnt.getCount() - Cnt.getParentCount();
>     +  Cnt.beginRegion(Builder);
>        EmitStmt(S.getSubStmt());
>      }
>
> Same issue as above: the switch weight for the default needs to include the
> counts for the case-ranges that were chained onto the default.  I think my
> patch had this right, but you lost that part.

Ditto.

> ...
>
>     diff --git a/lib/CodeGen/CodeGenPGO.h b/lib/CodeGen/CodeGenPGO.h
>     new file mode 100644
>     index 0000000..35cdfe0
>     --- /dev/null
>     +++ b/lib/CodeGen/CodeGenPGO.h
>     @@ -0,0 +1,202 @@
>
> ...
>
>     +  /// Assign counters to regions and configure them for PGO of a given
>     +  /// function. Does nothing if instrumentation is not enabled and either
>     +  /// generates global variables or associates PGO data with each of the
>     +  /// counters depending on whether we are generating or using
>     instrumentation.
>     +  void assignRegionCounters(GlobalDecl &GD);
>     +  /// Emit code to write counts for a given function to disk, if
>     necessary.
>     +  void emitWriteoutFunction(GlobalDecl &GD);
>     +  /// Clean up region counter state. Must be called if
>     assignRegionCounters is
>     +  /// used.
>     +  void clearRegionCounters();
>
> When I see the word “clear”, I tend to think of resetting the counts to zero,
> but this function is for freeing the storage. I would prefer a name that
> reflected that, perhaps “free”, “delete” or “deallocate”.

I've gone with "destroy". Semantically this needs to be something that
pairs with the "assign" above, and it may or may not simply be freeing
memory.

> ...
>
>     +  /// Emit an increment to the counter. One should normally prefer
>     beginRegion
>     +  /// to emitIncrement, except for in cases where the counter's region is
>     +  /// elided.
>     +  void emitIncrement(CGBuilderTy &Builder) {
>     +    PGO->emitCounterIncrement(Builder, Counter);
>     +  }
>
> Justin and I discussed this in person. I think there is no need for this
> function anymore.

I've changed all uses to just use beginRegion directly.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-profile-Rudimentary-suppport-for-PGO-instrumentation.patch
Type: text/x-patch
Size: 3821 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131211/1ace884d/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Driver-Accept-fprofile-instr-use-and-fprofile-instr-.patch
Type: text/x-patch
Size: 6322 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131211/1ace884d/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch
Type: text/x-patch
Size: 91262 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131211/1ace884d/attachment-0002.bin>


More information about the cfe-commits mailing list