<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p>Hello,<br><br>I have a question regarding PGO instrumented BBs (I use IR-level instrumentation).<br><br>It seems that instrumented BBs do not match between the two compilations for profile-gen and profile-use for some cases. Here is an example from SPECcpu 2006 lbm (a simple case consisting of just two modules).<br>In the first compilation, we have 5 instrumentation points for the main function as follows:<br><br><font face="Consolas">$ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc -debug-only=pgo-instrumentation</font><br><font face="Consolas">Dump Function main Hash: 61483163021 after CFGMST</font><br><font face="Consolas"> Number of Basic Blocks: 10</font><br><font face="Consolas"> BB: FakeNode Index=0</font><br><font face="Consolas"> BB: if.then Index=5</font><br><font face="Consolas"> BB: for.body Index=4</font><br><font face="Consolas"> BB: <a href="http://for.body.lr.ph" target="_blank">for.body.lr.ph</a> Index=3</font><br><font face="Consolas"> BB: entry Index=1</font><br><font face="Consolas"> BB: for.inc Index=8</font><br><font face="Consolas"> BB: if.then5 Index=7</font><br><font face="Consolas"> BB: if.end Index=6</font><br><font face="Consolas"> BB: for.end Index=2</font><br><font face="Consolas"> BB: for.end.loopexit Index=9</font><br><font face="Consolas"> Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)</font><br><font face="Consolas"> Edge 0: 8-->4 c W=247031</font><br><font face="Consolas"> Edge 1: 6-->8 c W=159375</font><br><font face="Consolas"> Edge 2: 4-->6 *c W=127500</font><br><font face="Consolas"> Edge 3: 1-->2 c W=4500</font><br><font face="Consolas"> Edge 4: 4-->5 W=127</font><br><font face="Consolas"> Edge 5: 5-->6 * W=127</font><br><font face="Consolas"> Edge 6: 6-->7 W=95</font><br><font face="Consolas"> Edge 7: 7-->8 * W=95</font><br><font face="Consolas"> Edge 8: 0-->1 W=12</font><br><font face="Consolas"> Edge 9: 2-->0 * W=12</font><br><font face="Consolas"> Edge 10: 3-->4 W=8</font><br><font face="Consolas"> Edge 11: 9-->2 W=8</font><br><font face="Consolas"> Edge 12: 1-->3 W=7</font><br><font face="Consolas"> Edge 13: 8-->9 * W=7</font><br><font face="Consolas">Split critical edge: 4 --> 6</font><br><font face="Consolas"> Adding Instrumentation in BB Name=for.body.if.end_crit_edge</font><br><font face="Consolas"> Adding Instrumentation in BB Name=if.then</font><br><font face="Consolas"> Adding Instrumentation in BB Name=if.then5</font><br><font face="Consolas"> Adding Instrumentation in BB Name=for.end</font><br><font face="Consolas"> Adding Instrumentation in BB Name=for.end.loopexit</font><br><br>After a training run, we get profile data for the main function as follows, but these count values are put into incorrect BBs in the second compilation.<br>Block counts: [0, 300, 4, 1, 1]<br><br><font face="Consolas">$ opt -analyze -pgo-instr-use _all_combined.bc -debug-only=pgo-instrumentation</font><br><font face="Consolas">Dump Function main Hash: 61483163021 after CFGMST</font><br><font face="Consolas"> Number of Basic Blocks: 10</font><br><font face="Consolas"> BB: FakeNode Index=0</font><br><font face="Consolas"> BB: <a href="http://for.body.lr.ph" target="_blank">for.body.lr.ph</a> Index=3</font><br><font face="Consolas"> BB: if.end Index=6</font><br><font face="Consolas"> BB: entry Index=1</font><br><font face="Consolas"> BB: if.then Index=5</font><br><font face="Consolas"> BB: for.body Index=4</font><br><font face="Consolas"> BB: for.end.loopexit Index=9</font><br><font face="Consolas"> BB: for.inc Index=8</font><br><font face="Consolas"> BB: if.then5 Index=7</font><br><font face="Consolas"> BB: for.end Index=2</font><br><font face="Consolas"> Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)</font><br><font face="Consolas"> Edge 0: 8-->4 c W=247031</font><br><font face="Consolas"> Edge 1: 6-->8 c W=159375</font><br><font face="Consolas"> Edge 2: 4-->6 *c W=127500</font><br><font face="Consolas"> Edge 3: 1-->2 c W=127058</font><br><font face="Consolas"> Edge 4: 0-->1 W=135</font><br><font face="Consolas"> Edge 5: 2-->0 * W=135</font><br><font face="Consolas"> Edge 6: 4-->5 W=127</font><br><font face="Consolas"> Edge 7: 5-->6 * W=127</font><br><font face="Consolas"> Edge 8: 6-->7 W=95</font><br><font face="Consolas"> Edge 9: 7-->8 * W=95</font><br><font face="Consolas"> Edge 10: 3-->4 W=8</font><br><font face="Consolas"> Edge 11: 9-->2 W=8</font><br><font face="Consolas"> Edge 12: 1-->3 W=7</font><br><font face="Consolas"> Edge 13: 8-->9 * W=7</font><br><font face="Consolas">5 counts</font><br><font face="Consolas"> 0: 0</font><br><font face="Consolas"> 1: 300</font><br><font face="Consolas"> 2: 4</font><br><font face="Consolas"> 3: 1</font><br><font face="Consolas"> 4: 1</font><br><font face="Consolas">SUM = 306</font><br><font face="Consolas">Split critical edge: 4 --> 6</font><br><font face="Consolas"> Setting BB Name=for.body.if.end_crit_edge with CountValue=0</font><br><font face="Consolas"> Setting BB Name=for.end with CountValue=300</font><br><font face="Consolas"> Setting BB Name=if.then with CountValue=4</font><br><font face="Consolas"> Setting BB Name=if.then5 with CountValue=1</font><br><font face="Consolas"> Setting BB Name=for.end.loopexit with CountValue=1</font><br><br>The CountValue 300 should go to the BB=if.then (Index 5), not for.end (Index 2). Actually because of this incorrect setting, the entry count of the main function is set 300, instead of 1 (after populating the count values).<br>The reason for this problem is that CFGMST edges are ordered in a different way due to different weight values (edges 0 --> 1 and 2 --> 0 get W=12 in the first compilation, while they get W=135 in the second compilation). The weight values are computed based on block frequency info and branch probability info, but somehow they produce different values between the two compilations.<br></p></div></blockquote><div><br></div><div>Different BFI produced for otherwise identical compilation is a bug we should fix (can cause other problems too). Can you file a bug about it? </div><div><br></div><div>thanks,</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p><br>How can we assume that CFGMST is constructed in the same way between the two compilations so that we can always set profile results into correct basic blocks?<br><br>Thank you,<br>--Toshjio<br>
</p></div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>