<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
h2
{mso-style-priority:9;
mso-style-link:"Heading 2 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:18.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
code
{mso-style-priority:99;
font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.Heading2Char
{mso-style-name:"Heading 2 Char";
mso-style-priority:9;
mso-style-link:"Heading 2";
font-family:"Cambria",serif;
color:#365F91;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:416365949;
mso-list-template-ids:1083342510;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1
{mso-list-id:649556096;
mso-list-template-ids:1567143778;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2
{mso-list-id:1676108227;
mso-list-template-ids:1901883660;}
@list l2:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l2:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l2:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi Ellis,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I support this proposal -- we’ve implemented some of these in our downstream compiler in order to support code coverage with embedded use cases. See the lightning talk slides from last year’s developers’ meeting:
<a href="https://llvm.org/devmtg/2020-09/slides/PhippsAlan_EmbeddedCodeCoverage_LLVM_Conf_Talk_final.pdf">
https://llvm.org/devmtg/2020-09/slides/PhippsAlan_EmbeddedCodeCoverage_LLVM_Conf_Talk_final.pdf</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Specifically, we keep all of the mapping data in the binary file and limit allocatable memory to raw profile counters only. We extend llvm-profdata to extract data from the binary file and combine it with the raw profile data when producing
an indexed profile.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">And we very recently implemented function-entry-only coverage as well as support for variable size counters. We would like to upstream some or all of this in the future, so perhaps we can work with you on doing that. However, none of our
work is productized for general PGO – only code coverage (so no support for reading data back in, for example).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Alan Phipps<o:p></o:p></p>
<p class="MsoNormal">MCU Compiler Team<o:p></o:p></p>
<p class="MsoNormal">Texas Instruments, Inc.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b>From:</b> llvm-dev <llvm-dev-bounces@lists.llvm.org> <b>On Behalf Of
</b>Ellis Hoag via llvm-dev<br>
<b>Sent:</b> Monday, October 18, 2021 12:28 PM<br>
<b>To:</b> llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Cc:</b> davidxl@google.com<br>
<b>Subject:</b> [EXTERNAL] [llvm-dev] [InstrProfiling] Lightweight Instrumentation<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><b><span style="font-size:24.0pt">RFC: Lightweight Instrumentation</span></b><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Hi all,<o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
Our team at Facebook would like to propose a lightweight variant of IR instrumentation PGO for use in the mobile space. IRPGO is a proven technology in LLVM that can boost performance for server workloads. However, the larger binary resulting from instrumentation
significantly limits its use for mobile applications. In this proposal, we introduce a few changes to IRPGO to reduce the instrumented binary size, making it suitable for PGO on mobile devices.
<br>
<br>
This proposal is driven by the same need behind the earlier <a href="https://reviews.llvm.org/D104060">
MIP (machine IR profile) prototype</a>. But unlike MIP where there is significant divergence from IRPGO, this proposed lightweight instrumentation fits into the existing IRPGO framework with a few extensions to achieve a smaller instrumented binary.
<br>
<br>
We’d like to share the new design and results from our prototype and get feedback.<br>
<br>
Best,<br>
Ellis, Kyungwoo, and Wenlei<o:p></o:p></p>
<h2>Motivation<o:p></o:p></h2>
<p class="MsoNormal">In the mobile space, profile guided optimization can also have an outsized impact on performance just like PGO for server workloads, but conventional instrumentation comes with a large binary size and code size increase as high as 50%,
which limits its use for mobile application for two reasons: <o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Mobile applications are very sensitive to total binary size as larger binaries take longer to download and use more space on devices. There could be a hard size limit for over-the-air (OTA) updates for this reason.
<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
When code (.text) size increases, it takes longer for applications to start up and could also degrade runtime performance due to more page faults on devices with limited RAM.<o:p></o:p></li></ul>
<p class="MsoNormal">Reducing the size overhead from instrumentation would make IRPGO usable for mobile applications so we could send instrumented binaries through OTA updates in production environments, collect representative production profiles, and apply
PGO. <o:p></o:p></p>
<h2>Overview<o:p></o:p></h2>
<p class="MsoNormal">The size overhead from IRPGO mainly comes from two things: 1) metadata for mapping raw counts back to IR/CFG, which has to stay with the binary. 2) the increased .text size due to insertion of instrumented code and less effective optimization
after instrumentation. Two extensions are proposed to reduce the size overhead from each of the above:<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo2">
We allow the use of debug info / dwarf as alternative metadata for mapping counts to IR, aka profile correlation. Debug info is extractable from the binary, therefore such metadata doesn’t need to be shipped to mobile devices. Debug info has been used extensively
for sampling based PGO in LLVM, so it has reasonable quality to support profile correlation.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo2">
We add the flexibility to allow coarse grained instrumentation that: 1) only insert probes at function entry instead of each block (or blocks decided by MST placement); 2) optional coverage mode using one byte booleans in addition to today’s counting mode using
8 byte counters.<o:p></o:p></li></ul>
<p class="MsoNormal">The extensions offer a spectrum of trade-off choices from the most accurate PGO to something very lightweight that can be used in mobile space. With debug info extracted and using function entry coverage mode, the size increase can be reduced
from close to 50% down to below 5% (measured with clang self-build PGO). <o:p></o:p></p>
<h2>Extractable Metadata<o:p></o:p></h2>
<p class="MsoNormal">With today’s IRPGO, the instrumentation runtime dumps out a <code>
<span style="font-size:10.0pt">profraw</span></code> profile at the end of training. The runtime creates a header and appends data from the
<code><span style="font-size:10.0pt">__llvm_prf_data</span></code>, <code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code>, and
<code><span style="font-size:10.0pt">__llvm_prf_names</span></code> sections to create a
<code><span style="font-size:10.0pt">profraw</span></code> profile. The <code><span style="font-size:10.0pt">__llvm_prf_data</span></code> section contains references to each function’s profile data (in
<code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code>) and name (in <code>
<span style="font-size:10.0pt">__llvm_prf_names</span></code>) so they are needed to correlate profile data to the functions they instrument.<br>
<br>
Some kind of metadata to correlate counts back to IR (specifically CFG blocks) is unavoidable. One way to reduce binary size is to make such metadata extractable so they don’t have to be shipped to mobile devices. We could make
<code><span style="font-size:10.0pt">__llvm_prf_data</span></code> and <code><span style="font-size:10.0pt">__llvm_prf_names</span></code> extractable, but the cost will be non-trivial and it will be a breaking change. On the other hand, debug info is extractable
from binary and it already does a very good job of maintaining mapping between address and source location / symbols. Sample PGO depends entirely on debug info for profile correlation. So we picked debug info as the alternative for extractable metadata.<br>
<br>
In our proposed instrumentation, we create a special global struct, e.g., <code><span style="font-size:10.0pt">__profc__Z3foov</span></code>, to hold counters for a particular function. The
<code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code> data section holds all of these structs and serves as placeholder for raw profile counters. In our final instrumented binary, we only have probe instructions and raw profile data without any
instrumentation metadata, i.e., there are no <code><span style="font-size:10.0pt">__llvm_prf_names</span></code> or
<code><span style="font-size:10.0pt">__llvm_prf_data</span></code> sections but we still have a
<code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code> section. At runtime, we dump the
<code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code> section to a file without any processing after profiling. To differentiate from IRPGO, the output from runtime is called
<code><span style="font-size:10.0pt">proflite</span></code> and we can add another
<code><span style="font-size:10.0pt">VARIANT_MASK_</span></code> flag to the <code>
<span style="font-size:10.0pt">Version</span></code> field of the profile header. At llvm-profdata post-processing time, we use debug info to correlate our raw profile data as follows. First we identify an instrumented function and look for its special global
struct that holds counters (<code><span style="font-size:10.0pt">__profc__Z3foov</span></code>) in the debug info. The debug info can tell us the address of that symbol in the binary and we can compute its offset from the
<code><span style="font-size:10.0pt">__llvm_prof_cnts</span></code> section. Then we can use that offset to read the function entry and block counters from the
<code><span style="font-size:10.0pt">proflite</span></code> file. Finally we populate
<code><span style="font-size:10.0pt">profdata</span></code> output for each function following the existing format.
<br>
<br>
Value profile is not going to be supported with extractable metadata right now, though we believe it can also be added following a similar scheme.
<br>
<br>
To improve debug info quality for profile correlation, <code><span style="font-size:10.0pt">-fdebug-info-for-profiling</span></code> from AutoFDO can be used. Additionally, we could also use
<a href="https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/hBrJaOWVAwAJ">pseudo-probe from CSSPGO</a> as the alternative metadata which is also fully extractable.
<br>
<br>
We propose a new flag <code><span style="font-size:10.0pt">-fprofile-generate-correlate=[profdata|debug-info|pseudo-probe]</span></code> to choose what metadata to use for profile correlation. Either we correlate with today’s IRPGO metadata and keep them in
their own sections (<code><span style="font-size:10.0pt">__llvm_prf_data </span>
</code>and <code><span style="font-size:10.0pt">__llvm_prf_names</span></code>), with debug info, or with pseudo-probe.<o:p></o:p></p>
<h2>Coarse-grained Instrumentation<o:p></o:p></h2>
<p class="MsoNormal">In addition to reducing metadata size ( <code><span style="font-size:10.0pt">__llvm_prf_names</span></code> and
<code><span style="font-size:10.0pt">__llvm_prf_data)</span></code>, we can also tune down
<span style="font-family:"Courier New"">.text</span> size and <code><span style="font-size:10.0pt">__llvm_prf_cnts</span></code> size. We do this by 1) only instrumenting function entries instead of each block and 2) lowering precision by tracking single byte
coverage data rather than 8 byte counters. This is a trade-off between profile quality and binary size.<br>
<br>
Function profile vs block profile and counting mode vs coverage mode can all be selected independently using our proposed flag
<code><span style="font-size:10.0pt">-fprofile-generate-mode=[func-cov|block-cov|func-cnt|block-cnt]</span></code>, and they can work with both extractable metadata as well as IRPGO‘s correlation method.
<code><span style="font-size:10.0pt">func-cov</span></code> and <code><span style="font-size:10.0pt">block-cov</span></code> use single byte booleans for coverage data while
<code><span style="font-size:10.0pt">func-cnt</span></code> and <code><span style="font-size:10.0pt">block-cnt</span></code> use 8 byte counters.
<code><span style="font-size:10.0pt">block-cnt</span></code> represents today’s IRPGO which is the default.<br>
<br>
When using a profile generated from modes other than <code><span style="font-size:10.0pt">block-cnt</span></code>, additional profile inference is needed before the counts can be consumed by optimizations. Such inference is done during profile loading and so
it’s transparent to optimizations.<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo3">
For block coverage mode, we will use coverage info to seed block count inference, and leverage static branch probability at the same time to produce a CFG profile that honors zero count blocks and converts live block coverage data into synthetic counts.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo3">
For function count mode, we will derived a CFG profile entirely from static branch probability, then scale the CFG profile based on function entry count.
<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo3">
Function coverage mode is handled similar to function count mode. For covered/live functions, we will derived a CFG profile entirely from static branch probability first, then scale that CFG profile by a constant.<o:p></o:p></li></ul>
<p class="MsoNormal">Experiments showed that even with coarse-grained function entry profiles, mobile application can still benefit from PGO. But the smaller binary make it possible for mobile to use PGO.
<o:p></o:p></p>
<h2>Workflow<o:p></o:p></h2>
<p class="MsoNormal" style="margin-bottom:12.0pt">Since these are extensions that share the same underlying PGO framework, the workflow for lightweight PGO is very similar to existing IRPGO.
<o:p></o:p></p>
<div>
<p class="MsoNormal">The diagram below has the PGO workflow today (shown in red) in comparison with the workflow for lightweight instrumentation (shown in green). We first create an instrumentation build that produces a raw profile at runtime. Then we use the
<span style="font-family:"Courier New"">llvm-profdata</span> tool to convert that raw profile to a profile that the compiler can consume in the PGO build. The main difference for lightweight instrumentation is that we create an instrumentation build with debug
info and we use that debug info to create our final profile.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><img border="0" width="558" height="332" style="width:5.8125in;height:3.4583in" id="_x0000_i1025" src="cid:image001.png@01D7C421.A5C38CA0" alt="image.png"><o:p></o:p></p>
<h2>Prototype & Results<o:p></o:p></h2>
<div>
<p class="MsoNormal">We have a <a href="https://github.com/ellishg/llvm-project/commits/instr-correlate-debug-info">
proof of concept</a> using dwarf as the extractable metadata and single byte function coverage instrumentation. We measured code size by building Clang with and without instrumentation using -Oz and no value profiling. Our lightweight instrumented Clang binary
is only +4 MB (+3.48%) larger than a non-instrumented binary. We compare this with today’s PGO instrumentation Clang binary which is +54 MB (+46.96%) larger. If we used debug info to correlate normal instrumentation (without value profiling) instead of just
function coverage then we would expect to see an overhead of +43.2 MB (+37.5%). We don’t have performance data on clang experiments using the prototype since not all components are implemented. However, an alternative implementation earlier (similar to MIP)
delivered good performance boost for mobile applications.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><img border="0" width="579" height="102" style="width:6.0277in;height:1.0625in" id="_x0000_i1026" src="cid:image002.jpg@01D7C421.A5C38CA0" alt="table-large.jpg"><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</body>
</html>