[llvm-dev] Instrumented BB in PGO
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 21 22:27:48 PDT 2016
thank you. I have assigned the bug to xur at .
David
On Mon, Mar 21, 2016 at 10:24 PM, Toshio Suganuma <SUGANUMA at jp.ibm.com>
wrote:
> Hi David,
>
> Thank you.
> I just submitted a bug report 27024 (PGO instrumentation profile data is
> not reflected in correct basic blocks).
>
> Thank you,
> --Toshio
>
> [image: Inactive hide details for Xinliang David Li ---2016/03/22
> 12:04:10---On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-]Xinliang
> David Li ---2016/03/22 12:04:10---On Mon, Mar 21, 2016 at 7:19 PM, Toshio
> Suganuma via llvm-dev < llvm-dev at lists.llvm.org> wrote:
>
> From: Xinliang David Li <xinliangli at gmail.com>
> To: Toshio Suganuma/Japan/IBM at IBMJP
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Rong Xu <xur at google.com>
> Date: 2016/03/22 12:04
> Subject: Re: [llvm-dev] Instrumented BB in PGO
> ------------------------------
>
>
>
>
>
> On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-dev <
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>
> Hello,
>
> I have a question regarding PGO instrumented BBs (I use IR-level
> instrumentation).
>
> It seems that instrumented BBs do not match between the two
> compilations for profile-gen and profile-use for some cases. Here is an
> example from SPECcpu 2006 lbm (a simple case consisting of just two
> modules).
> In the first compilation, we have 5 instrumentation points for the
> main function as follows:
>
> $ opt -pgo-instr-gen -instrprof _all_combined.bc -o
> _all_combined_inst.bc -debug-only=pgo-instrumentation
> Dump Function main Hash: 61483163021 after CFGMST
> Number of Basic Blocks: 10
> BB: FakeNode Index=0
> BB: if.then Index=5
> BB: for.body Index=4
> BB: *for.body.lr.ph* <http://for.body.lr.ph/> Index=3
> BB: entry Index=1
> BB: for.inc Index=8
> BB: if.then5 Index=7
> BB: if.end Index=6
> BB: for.end Index=2
> BB: for.end.loopexit Index=9
> Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
> Edge 0: 8-->4 c W=247031
> Edge 1: 6-->8 c W=159375
> Edge 2: 4-->6 *c W=127500
> Edge 3: 1-->2 c W=4500
> Edge 4: 4-->5 W=127
> Edge 5: 5-->6 * W=127
> Edge 6: 6-->7 W=95
> Edge 7: 7-->8 * W=95
> Edge 8: 0-->1 W=12
> Edge 9: 2-->0 * W=12
> Edge 10: 3-->4 W=8
> Edge 11: 9-->2 W=8
> Edge 12: 1-->3 W=7
> Edge 13: 8-->9 * W=7
> Split critical edge: 4 --> 6
> Adding Instrumentation in BB Name=for.body.if.end_crit_edge
> Adding Instrumentation in BB Name=if.then
> Adding Instrumentation in BB Name=if.then5
> Adding Instrumentation in BB Name=for.end
> Adding Instrumentation in BB Name=for.end.loopexit
>
> After a training run, we get profile data for the main function as
> follows, but these count values are put into incorrect BBs in the second
> compilation.
> Block counts: [0, 300, 4, 1, 1]
>
> $ opt -analyze -pgo-instr-use _all_combined.bc
> -debug-only=pgo-instrumentation
> Dump Function main Hash: 61483163021 after CFGMST
> Number of Basic Blocks: 10
> BB: FakeNode Index=0
> BB: *for.body.lr.ph* <http://for.body.lr.ph/> Index=3
> BB: if.end Index=6
> BB: entry Index=1
> BB: if.then Index=5
> BB: for.body Index=4
> BB: for.end.loopexit Index=9
> BB: for.inc Index=8
> BB: if.then5 Index=7
> BB: for.end Index=2
> Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
> Edge 0: 8-->4 c W=247031
> Edge 1: 6-->8 c W=159375
> Edge 2: 4-->6 *c W=127500
> Edge 3: 1-->2 c W=127058
> Edge 4: 0-->1 W=135
> Edge 5: 2-->0 * W=135
> Edge 6: 4-->5 W=127
> Edge 7: 5-->6 * W=127
> Edge 8: 6-->7 W=95
> Edge 9: 7-->8 * W=95
> Edge 10: 3-->4 W=8
> Edge 11: 9-->2 W=8
> Edge 12: 1-->3 W=7
> Edge 13: 8-->9 * W=7
> 5 counts
> 0: 0
> 1: 300
> 2: 4
> 3: 1
> 4: 1
> SUM = 306
> Split critical edge: 4 --> 6
> Setting BB Name=for.body.if.end_crit_edge with CountValue=0
> Setting BB Name=for.end with CountValue=300
> Setting BB Name=if.then with CountValue=4
> Setting BB Name=if.then5 with CountValue=1
> Setting BB Name=for.end.loopexit with CountValue=1
>
> The CountValue 300 should go to the BB=if.then (Index 5), not for.end
> (Index 2). Actually because of this incorrect setting, the entry count of
> the main function is set 300, instead of 1 (after populating the count
> values).
> The reason for this problem is that CFGMST edges are ordered in a
> different way due to different weight values (edges 0 --> 1 and 2 --> 0 get
> W=12 in the first compilation, while they get W=135 in the second
> compilation). The weight values are computed based on block frequency info
> and branch probability info, but somehow they produce different values
> between the two compilations.
>
>
>
> Different BFI produced for otherwise identical compilation is a bug we
> should fix (can cause other problems too). Can you file a bug about it?
>
> thanks,
>
> David
>
>
>
> How can we assume that CFGMST is constructed in the same way between
> the two compilations so that we can always set profile results into correct
> basic blocks?
>
> Thank you,
> --Toshjio
>
> _______________________________________________
> LLVM Developers mailing list
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/476c17e4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/476c17e4/attachment.gif>
More information about the llvm-dev
mailing list