[llvm-dev] Instrumented BB in PGO

Toshio Suganuma via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 21 19:19:41 PDT 2016



Hello,

I have a question regarding PGO instrumented BBs (I use IR-level
instrumentation).

It seems that instrumented BBs do not match between the two compilations
for profile-gen and profile-use for some cases. Here is an example from
SPECcpu 2006 lbm (a simple case consisting of just two modules).
In the first compilation, we have 5 instrumentation points for the main
function as follows:

$ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc
-debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021    after CFGMST
  Number of Basic Blocks: 10
  BB: FakeNode  Index=0
  BB: if.then  Index=5
  BB: for.body  Index=4
  BB: for.body.lr.ph  Index=3
  BB: entry  Index=1
  BB: for.inc  Index=8
  BB: if.then5  Index=7
  BB: if.end  Index=6
  BB: for.end  Index=2
  BB: for.end.loopexit  Index=9
  Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
  Edge 0: 8-->4  c  W=247031
  Edge 1: 6-->8  c  W=159375
  Edge 2: 4-->6 *c  W=127500
  Edge 3: 1-->2  c  W=4500
  Edge 4: 4-->5     W=127
  Edge 5: 5-->6 *   W=127
  Edge 6: 6-->7     W=95
  Edge 7: 7-->8 *   W=95
  Edge 8: 0-->1     W=12
  Edge 9: 2-->0 *   W=12
  Edge 10: 3-->4     W=8
  Edge 11: 9-->2     W=8
  Edge 12: 1-->3     W=7
  Edge 13: 8-->9 *   W=7
Split critical edge: 4 --> 6
  Adding Instrumentation in BB Name=for.body.if.end_crit_edge
  Adding Instrumentation in BB Name=if.then
  Adding Instrumentation in BB Name=if.then5
  Adding Instrumentation in BB Name=for.end
  Adding Instrumentation in BB Name=for.end.loopexit

After a training run, we get profile data for the main function as follows,
but these count values are put into incorrect BBs in the second
compilation.
Block counts: [0, 300, 4, 1, 1]

$ opt -analyze -pgo-instr-use _all_combined.bc
-debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021    after CFGMST
  Number of Basic Blocks: 10
  BB: FakeNode  Index=0
  BB: for.body.lr.ph  Index=3
  BB: if.end  Index=6
  BB: entry  Index=1
  BB: if.then  Index=5
  BB: for.body  Index=4
  BB: for.end.loopexit  Index=9
  BB: for.inc  Index=8
  BB: if.then5  Index=7
  BB: for.end  Index=2
  Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
  Edge 0: 8-->4  c  W=247031
  Edge 1: 6-->8  c  W=159375
  Edge 2: 4-->6 *c  W=127500
  Edge 3: 1-->2  c  W=127058
  Edge 4: 0-->1     W=135
  Edge 5: 2-->0 *   W=135
  Edge 6: 4-->5     W=127
  Edge 7: 5-->6 *   W=127
  Edge 8: 6-->7     W=95
  Edge 9: 7-->8 *   W=95
  Edge 10: 3-->4     W=8
  Edge 11: 9-->2     W=8
  Edge 12: 1-->3     W=7
  Edge 13: 8-->9 *   W=7
5 counts
  0: 0
  1: 300
  2: 4
  3: 1
  4: 1
SUM =  306
Split critical edge: 4 --> 6
  Setting BB Name=for.body.if.end_crit_edge with CountValue=0
  Setting BB Name=for.end with CountValue=300
  Setting BB Name=if.then with CountValue=4
  Setting BB Name=if.then5 with CountValue=1
  Setting BB Name=for.end.loopexit with CountValue=1

The CountValue 300 should go to the BB=if.then (Index 5), not for.end
(Index 2). Actually because of this incorrect setting, the entry count of
the main function is set 300, instead of 1 (after populating the count
values).
The reason for this problem is that CFGMST edges are ordered in a different
way due to different weight values (edges 0 --> 1 and 2 --> 0 get W=12 in
the first compilation, while they get W=135 in the second compilation). The
weight values are computed based on block frequency info and branch
probability info, but somehow they produce different values between the two
compilations.

How can we assume that CFGMST is constructed in the same way between the
two compilations so that we can always set profile results into correct
basic blocks?

Thank you,
--Toshjio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/c8bf9b83/attachment.html>


More information about the llvm-dev mailing list