<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Dec 1, 2013, at 8:15 PM, Justin Bogner <<a href="mailto:mail@justinbogner.com">mail@justinbogner.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hey list,<br><br>Attached are the initial patches for instrumentation based PGO. The<br>first two are for clang, and the third for compiler-rt. <br><br>This is an implementation of Bob's proposal from back in September. Most<br>of the design can be read about here:<br><br> <a href="http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-September/031825.html">http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-September/031825.html</a><br><br>I changed the flags to -fprofile-instr-use and -fprofile-instr-generate<br>to match the style of Diego's sampling based PGO flag. I'm open to other<br>suggestions, but do note that using -fprofile-generate introduces a bit<br>of an awkward conflict with the GCC-like GCDA profiling that is also<br>present in LLVM, since that's the flag GCC uses.<br><br>There are a few things that obviously still need work, but I think this<br>is a pretty reasonable incremental step. Notably:<br><br>- The data format is inefficent plain text that isn't even particularly<br> easily understood by a human.<br>- The compiler-rt support is fairly limited, and there's no way to<br> specify or change the profile data output file name.<br>- Objective C is basically completely ignored.<br>- The counters aren't scaled, so if they're greater than 32 bits they'll<br> just be truncated.<br>- The code coverage aspect isn't included in this patch, but the<br> counters are set up so that the data should be sufficient other than<br> that there's no real way to convert back to source locations yet.<br><br>Let me know what you think. Thanks.</blockquote><div><br></div><div>I’ve got a few minor comments below. The only serious one is that I don’t think your handling of switch case-ranges and defaults is correct. Otherwise, it looks good. Since this is a pretty big change, I’d like John to give his approval before you commit, since he is the code owner for this part of clang.</div><div><br></div>Once the dust settles on this, we’re going to need to add some good documentation. It really isn’t obvious how this all fits together. I like the way you’ve revised the loop instrumentation to move some of the counters to less frequently executed code. That will help make the instrumented code run faster. I think we should investigate taking it one step further, but that can wait for a follow-up patch. Specifically, I think we can avoid adding extra counters for breaks and continues in loops, since we should be able to compute the counts for those without extra instrumentation overhead. We can discuss that later.<br><br></div><div><blockquote type="cite"><div style="margin: 0px; font-family: Menlo;">diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td</div><div style="margin: 0px; font-family: Menlo;">index 9e7dc78..02d1087 100644</div><div style="margin: 0px; font-family: Menlo;">--- a/include/clang/Driver/Options.td</div><div style="margin: 0px; font-family: Menlo;">+++ b/include/clang/Driver/Options.td</div><div style="margin: 0px; font-family: Menlo;">@@ -369,6 +369,13 @@ def fno_autolink : Flag <["-"], "fno-autolink">, Group<f_Group>,</div><div style="margin: 0px; font-family: Menlo;"> def fprofile_sample_use_EQ : Joined<["-"], "fprofile-sample-use=">,</div><div style="margin: 0px; font-family: Menlo;"> Group<f_Group>, Flags<[DriverOption, CC1Option]>,</div><div style="margin: 0px; font-family: Menlo;"> HelpText<"Enable sample-based profile guided optimizations">;</div><div style="margin: 0px; font-family: Menlo;">+def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,</div><div style="margin: 0px; font-family: Menlo;">+ Group<f_Group>, Flags<[CC1Option]>,</div><div style="margin: 0px; font-family: Menlo;">+ HelpText<"Instrument regions in each function to collect execution counts”>;</div></blockquote><div><br></div>I know this comes from my initial patch, but I would prefer to drop the description of this as “instrumenting regions”. That’s more of an internal implementation detail. How about just: “Generate instrumented code to collect execution counts”?</div><div><br><blockquote type="cite"><div style="margin: 0px; font-family: Menlo;">+def fprofile_instr_use : Flag<["-"], "fprofile-instr-use">, Group<f_Group>;</div><div style="margin: 0px; font-family: Menlo;">+def fprofile_instr_use_EQ : Joined<["-"], "fprofile-instr-use=">,</div><div style="margin: 0px; font-family: Menlo;">+ Group<f_Group>, Flags<[CC1Option]>,</div><div style="margin: 0px; font-family: Menlo;">+ HelpText<"Use region instrumentation data for profile-guided optimization”>;</div></blockquote><div><br></div>Likewise. How about just dropping the word “region” from the description?</div><div><br></div><div><blockquote type="cite"><div style="margin: 0px; font-family: Menlo; min-height: 14px;"><br></div><div style="margin: 0px; font-family: Menlo;"> def fblocks : Flag<["-"], "fblocks">, Group<f_Group>, Flags<[CC1Option]>,</div><div style="margin: 0px; font-family: Menlo;"> HelpText<"Enable the 'blocks' language feature">;</div></blockquote><div><div style="margin: 0px;"><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">diff --git a/include/clang/Frontend/CodeGenOptions.def b/include/clang/Frontend/CodeGenOptions.def</div><div style="margin: 0px;">index 78b825d..154bf44 100644</div><div style="margin: 0px;">--- a/include/clang/Frontend/CodeGenOptions.def</div><div style="margin: 0px;">+++ b/include/clang/Frontend/CodeGenOptions.def</div><div style="margin: 0px;">@@ -87,6 +87,9 @@ CODEGENOPT(OmitLeafFramePointer , 1, 0) ///< Set when -momit-leaf-frame-pointer</div><div style="margin: 0px;"> VALUE_CODEGENOPT(OptimizationLevel, 3, 0) ///< The -O[0-4] option specified.</div><div style="margin: 0px;"> VALUE_CODEGENOPT(OptimizeSize, 2, 0) ///< If -Os (==1) or -Oz (==2) is specified.</div><div style="margin: 0px; min-height: 14px;"><br></div><div style="margin: 0px;">+CODEGENOPT(ProfileInstrRegions , 1, 0) ///< Instrument regions with execution</div><div style="margin: 0px;">+ ///< counts to use with PGO.</div><div style="margin: 0px;">+</div></blockquote><div style="font-family: Menlo; margin: 0px;"><br></div>How about renaming to “ProfileInstrGenerate” to match the option?</div><div style="margin: 0px;"><br><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;"> /// If -fpcc-struct-return or -freg-struct-return is specified.</div><div style="margin: 0px;"> ENUM_CODEGENOPT(StructReturnConvention, StructReturnConventionKind, 2, SRCK_Default)</div></blockquote><font face="Menlo">...</font><br><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px; min-height: 14px;">diff --git a/lib/CodeGen/CGStmt.cpp b/lib/CodeGen/CGStmt.cpp</div></blockquote><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">index 8325ddd..16d10a6 100644</div><div style="margin: 0px;">--- a/lib/CodeGen/CGStmt.cpp</div><div style="margin: 0px;">+++ b/lib/CodeGen/CGStmt.cpp</div></blockquote><font face="Menlo">...</font><br><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">@@ -573,16 +601,20 @@ void CodeGenFunction::EmitDoStmt(const DoStmt &S) {</div><div style="margin: 0px;"> JumpDest LoopExit = getJumpDestInCurrentScope("do.end");</div><div style="margin: 0px;"> JumpDest LoopCond = getJumpDestInCurrentScope("do.cond");</div><div style="margin: 0px; min-height: 14px;"><br></div><div style="margin: 0px;">+ unsigned Counter = PGO.getRegionCounter(&S);</div><div style="margin: 0px;">+ RegionCounter Cnt(PGO, Counter);</div><div style="margin: 0px;">+</div><div style="margin: 0px;"> // Store the blocks to use for break and continue.</div><div style="margin: 0px;">- BreakContinueStack.push_back(BreakContinue(LoopExit, LoopCond));</div><div style="margin: 0px;">+ BreakContinueStack.push_back(BreakContinue(LoopExit, LoopCond, Counter));</div><div style="margin: 0px; min-height: 14px;"><br></div><div style="margin: 0px;">- // Emit the body of the loop.</div></blockquote><div style="font-family: Menlo; margin: 0px;"><br></div>Why delete the comment?</div><div style="margin: 0px; font-family: Menlo;"><br><blockquote type="cite"><div style="margin: 0px;"> llvm::BasicBlock *LoopBody = createBasicBlock("do.body");</div><div style="margin: 0px;"> EmitBlock(LoopBody);</div><div style="margin: 0px;">+ Cnt.beginRegion(Builder);</div><div style="margin: 0px;"> {</div><div style="margin: 0px;"> RunCleanupsScope BodyScope(*this);</div><div style="margin: 0px;"> EmitStmt(S.getBody());</div><div style="margin: 0px;"> }</div><div style="margin: 0px;">+ Cnt.adjustFallThroughCount();</div><div style="margin: 0px; min-height: 14px;"><br></div><div style="margin: 0px;"> BreakContinueStack.pop_back();</div><div style="margin: 0px; min-height: 14px;"><br></div></blockquote>...</div></div><div style="margin: 0px;"><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">@@ -923,7 +1004,10 @@ void CodeGenFunction::EmitCaseStmtRange(const CaseStmt &S) {</div><div style="margin: 0px;"> // FIXME: parameters such as this should not be hardcoded.</div><div style="margin: 0px;"> if (Range.ult(llvm::APInt(Range.getBitWidth(), 64))) {</div><div style="margin: 0px;"> // Range is small enough to add multiple switch instruction cases.</div><div style="margin: 0px;">+ uint64_t Total = CaseCnt.getCount() - CaseCnt.getParentCount();</div><div style="margin: 0px;"> for (unsigned i = 0, e = Range.getZExtValue() + 1; i != e; ++i) {</div><div style="margin: 0px;">+ if (PGO.haveRegionCounts())</div><div style="margin: 0px;">+ SwitchWeights->push_back(Total / e);</div><div style="margin: 0px;"> SwitchInsn->addCase(Builder.getInt(LHS), CaseDest);</div><div style="margin: 0px;"> LHS++;</div><div style="margin: 0px;"> }</div></blockquote><div style="font-family: Menlo;"><div style="margin: 0px;"><br></div></div><div style="margin: 0px;">You dropped the code that I had here to make sure that the sum of the weights adds up to the original count. I’m not absolutely certain that is necessary, but it seems like you could get some bad results otherwise due to rounding errors. Are you sure it doesn’t matter?</div><div style="margin: 0px;"><br></div><div style="margin: 0px;"><blockquote type="cite"><div style="margin: 0px; font-family: Menlo;">@@ -948,7 +1032,10 @@ void CodeGenFunction::EmitCaseStmtRange(const CaseStmt &S) {</div><div style="margin: 0px; font-family: Menlo;"> Builder.CreateSub(SwitchInsn->getCondition(), Builder.getInt(LHS));</div><div style="margin: 0px; font-family: Menlo;"> llvm::Value *Cond =</div><div style="margin: 0px; font-family: Menlo;"> Builder.CreateICmpULE(Diff, Builder.getInt(Range), "inbounds");</div><div style="margin: 0px; font-family: Menlo;">- Builder.CreateCondBr(Cond, CaseDest, FalseDest);</div><div style="margin: 0px; font-family: Menlo;">+ uint64_t ThisCount = CaseCnt.getCount() - CaseCnt.getParentCount();</div><div style="margin: 0px; font-family: Menlo;">+ uint64_t ElseCount = SwitchEntryCount - ThisCount;</div></blockquote><div style="margin: 0px;"><br></div>Maybe I’m misunderstanding, but this looks wrong to me. This branch is on the chain of range checks that ends in the default. You can’t find the “else” count that way.</div><div style="margin: 0px;"><br><blockquote type="cite"><div style="margin: 0px; font-family: Menlo;">+ Builder.CreateCondBr(Cond, CaseDest, FalseDest,</div><div style="margin: 0px; font-family: Menlo;">+ PGO.createBranchWeights(ThisCount, ElseCount));</div><div style="margin: 0px; font-family: Menlo; min-height: 14px;"><br></div><div style="margin: 0px; font-family: Menlo;"> // Restore the appropriate insertion point.</div><div style="margin: 0px; font-family: Menlo;"> if (RestoreBB)</div></blockquote><div><div style="margin: 0px; font-family: Menlo;">...</div></div><div style="margin: 0px;"><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">@@ -1031,6 +1131,13 @@ void CodeGenFunction::EmitDefaultStmt(const DefaultStmt &S) {</div><div style="margin: 0px;"> assert(DefaultBlock->empty() &&</div><div style="margin: 0px;"> "EmitDefaultStmt: Default block already defined?");</div><div style="margin: 0px;"> EmitBlock(DefaultBlock);</div><div style="margin: 0px;">+</div><div style="margin: 0px;">+ RegionCounter Cnt = getPGORegionCounter(&S);</div><div style="margin: 0px;">+ // We already put a dummy weight in for the default case when we created the</div><div style="margin: 0px;">+ // switch, so we just need to fill it in.</div><div style="margin: 0px;">+ if (PGO.haveRegionCounts())</div><div style="margin: 0px;">+ (*SwitchWeights)[0] = Cnt.getCount() - Cnt.getParentCount();</div><div style="margin: 0px;">+ Cnt.beginRegion(Builder);</div><div style="margin: 0px;"> EmitStmt(S.getSubStmt());</div><div style="margin: 0px;"> }</div><div style="margin: 0px; min-height: 14px;"><br></div></blockquote>Same issue as above: the switch weight for the default needs to include the counts for the case-ranges that were chained onto the default. I think my patch had this right, but you lost that part.</div><div style="margin: 0px;"><br></div><div style="margin: 0px;"><font face="Menlo">...</font></div><div style="margin: 0px;"><blockquote type="cite"><div style="margin: 0px; font-family: Menlo;">diff --git a/lib/CodeGen/CodeGenPGO.h b/lib/CodeGen/CodeGenPGO.h</div><div style="margin: 0px; font-family: Menlo;">new file mode 100644</div><div style="margin: 0px; font-family: Menlo;">index 0000000..35cdfe0</div><div style="margin: 0px; font-family: Menlo;">--- /dev/null</div><div style="margin: 0px; font-family: Menlo;">+++ b/lib/CodeGen/CodeGenPGO.h</div><div style="margin: 0px; font-family: Menlo;">@@ -0,0 +1,202 @@</div></blockquote><div><div style="margin: 0px; font-family: Menlo;">...</div></div><div style="margin: 0px;"><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">+ /// Assign counters to regions and configure them for PGO of a given</div><div style="margin: 0px;">+ /// function. Does nothing if instrumentation is not enabled and either</div><div style="margin: 0px;">+ /// generates global variables or associates PGO data with each of the</div><div style="margin: 0px;">+ /// counters depending on whether we are generating or using instrumentation.</div><div style="margin: 0px;">+ void assignRegionCounters(GlobalDecl &GD);</div><div style="margin: 0px;">+ /// Emit code to write counts for a given function to disk, if necessary.</div><div style="margin: 0px;">+ void emitWriteoutFunction(GlobalDecl &GD);</div><div style="margin: 0px;">+ /// Clean up region counter state. Must be called if assignRegionCounters is</div><div style="margin: 0px;">+ /// used.</div><div style="margin: 0px;">+ void clearRegionCounters();</div></blockquote><div style="font-family: Menlo; margin: 0px;"><br></div>When I see the word “clear”, I tend to think of resetting the counts to zero, but this function is for freeing the storage. I would prefer a name that reflected that, perhaps “free”, “delete” or “deallocate”.</div><div style="margin: 0px;"><br></div><div style="margin: 0px;"><span style="font-family: Menlo;">...</span></div><div style="margin: 0px;"><div><div style="margin: 0px;"><blockquote type="cite" style="font-family: Menlo;"><div style="margin: 0px;">+ /// Emit an increment to the counter. One should normally prefer beginRegion</div><div style="margin: 0px;">+ /// to emitIncrement, except for in cases where the counter's region is</div><div style="margin: 0px;">+ /// elided.</div><div style="margin: 0px;">+ void emitIncrement(CGBuilderTy &Builder) {</div><div style="margin: 0px;">+ PGO->emitCounterIncrement(Builder, Counter);</div><div style="margin: 0px;">+ }</div></blockquote><div style="font-family: Menlo;"><div style="margin: 0px;"><br></div></div><div style="margin: 0px;">Justin and I discussed this in person. I think there is no need for this function anymore.</div></div></div></div></div></div></div></div></body></html>