[llvm-dev] GSoC Proposal : Path Profiling Support

Snehasish Kumar via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 22 21:28:12 PDT 2016


Hi David,

> Hi Snehasish, thanks for writing up the proposal.
>
> As it stands today, path profiling still has serious scalability issue that
> prevents it from being usable by any optimization passes that may benefit
> from it.

I agree; it would be an interesting to see how we can reduce the overheads
to bring it within acceptable limits.

> It will be interesting to see how the
> sampling based approach  matches up instrumentation based method in
> detecting hot paths.

I actually performed some experiments regarding this very issue. I
created an edge
profile from the collected path profile and constructed superblocks
for innermost loops
in 28 applications. I found

a) in 6 applications superblocks (i.e hot paths constructed from edge
profiles) are
*not present* as executed paths in the collected profile. For example,
in 401.bzip2 70 of 141 inner loop hot paths derived
from edge profiles do not corresponding to actual executed paths.

b) in 6 applications (overlap of 3 with the previous 6) some hot paths
constructed
from edge profiles are not the highest ranked path. For example, this occurs in
4 out of 32 innermost loops in 458.sjeng.

The high level reason this happens is due to overlapped paths. I can
go into more details if
there is interest.

>
> Independent of the method used in generating path profile data, your
> proposed work on the path profile info representation and query APIs can be
> shared.
>
> thanks,
>
> David
>
> On Mon, Mar 21, 2016 at 3:07 PM, Snehasish Kumar <ska124 at sfu.ca> wrote:
>>
>> Hi
>>
>> I am pinging to find out if there is any interest to mentor this
>> proposal for GSoC this year? I've submitted a draft via the GSoC
>> website.
>>
>> David, Vedant it would be great if I could get some advice on refining
>> the goals and particulars of the implementation.
>> The version we use internally is not performance oriented and will
>> require refactoring.
>> Here is a link to the draft document [1].
>>
>> Thanks,
>> Snehasish
>>
>>
>> [1]
>> https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing
>>
>>
>> On Wed, Mar 16, 2016 at 2:03 PM, Snehasish Kumar <ska124 at sfu.ca> wrote:
>> > Hi David,
>> >
>> >> Are the data below all collected when only one function is picked for
>> >> instrumentation?
>> >
>> > Yes, here is a list of the benchmarks and selected functions.
>> >
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | blks            | _Z19BlkSchlsEqEuroNoDivfffffif
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | bodytrack       |
>> >
>> > _ZN17ImageMeasurements11InsideErrorERK17ProjectedCylinderRK11BinaryImageRiS6_
>> >                |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | bzip2           | BZ2_compressBlock
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | ferret          | image_segment
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | fluidanimate    | _Z13ComputeForcesv
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | freqmine        |
>> > _Z32FPArray_conditional_pattern_baseIhEiP7FP_treeiiT_
>> >                       |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | gcc             | bitmap_operation
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | hmmer           | P7Viterbi
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | lbm             | LBM_performStreamCollide
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | mcf             | price_out_impl
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | mcf2000         | price_out_impl
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | namd            |
>> > _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded
>> >                       |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | povray          |
>> >
>> > _ZN3povL24All_Sphere_IntersectionsEPNS_13Object_StructEPNS_10Ray_StructEPNS_13istack_structE
>> > |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | sjeng           | gen
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | soplex          | _ZN6soplex9CLUFactor16vSolveUrightNoNZEPdS1_Piid
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | sphinx          | vector_gautbl_eval_logs3
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | streamcluster   | _Z5pgainlP6PointsdPliP17pthread_barrier_t
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | swaptions       | _Z21HJM_Swaption_BlockingPddddddiidS_PS_llii
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> > | h264ref         | dct_luma_16x16
>> >                                           |
>> >
>> > +-----------------+----------------------------------------------------------------------------------------------+
>> >
>> >> Do you have data when such manual selection is not done?
>> >
>> > At the moment, I do not.
>> >
>> >>
>> >> thanks,
>> >>
>> >> David
>> >>
>> >>
>> >>>
>> >>> numpaths = Number of possible paths
>> >>> epp+compile = Time taken to compute encoding, insert instrumentation
>> >>> and
>> >>> compile to executable
>> >>> compile = Time taken to compile to executable
>> >>> execpaths = Number of paths dynamically executed
>> >>> epp-exec-time = Execution time with instrumentation
>> >>> exec-time = Normal execution time
>> >>> epp-bin-size = Size of instrumented binary in bytes
>> >>> bin-size = Size of binary
>> >>> ** size of shared library in bytes = 598042
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | benchmark     | numpaths | epp+compile | compile   | execpaths |
>> >>> epp-exec-time | exec-time | epp-bin-size | bin-size |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | blks          | 2        | 0m1.036s    | 0m1.008s  | 2         |
>> >>> 0m3.643s      | 0m3.205s  | 155931       | 155459   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | bodytrack     | 29       | 0m4.907s    | 0m4.881s  | 5         |
>> >>> 0m14.786s     | 0m1.943s  | 2125256      | 2124224  |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | bzip2         | 60       | 0m1.274s    | 0m1.268s  | 3         |
>> >>> 0m9.441s      | 0m9.624s  | 259125       | 258477   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | ferret        | 360921   | 0m26.208s   | 0m26.102s | 40        |
>> >>> 0m10.342s     | 0m6.224s  | 8342571      | 8338588  |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | fluidanimate  | 384117   | 0m0.895s    | 0m0.869s  | 88        |
>> >>> 0m56.631s     | 0m1.294s  | 202702       | 197878   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | freqmine      | 45       | 0m1.220s    | 0m1.214s  | 18        |
>> >>> 0m22.150s     | 0m5.515s  | 278615       | 277656   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | gcc           | 6026     | 0m31.941s   | 0m31.327s | 125       |
>> >>> 1m30.139s     | 0m36.601s | 6991413      | 6991245  |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | hmmer         | 1882     | 0m3.193s    | 0m3.232s  | 65        |
>> >>> 0m58.911s     | 0m2.474s  | 744510       | 742806   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | mcf           | 230      | 0m0.838s    | 0m0.830s  | 10        |
>> >>> 0m11.097s     | 0m3.074s  | 162680       | 161736   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | mcf2000       | 1155     | 0m0.859s    | 0m0.853s  | 26        |
>> >>> 0m24.169s     | 0m4.625s  | 166092       | 165213   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | povray        | 17       | 0m8.543s    | 0m8.552s  | 4         |
>> >>> 9m24.562s     | 5m39.295s | 2388152      | 2387960  |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | sjeng         | 158740   | 0m1.648s    | 0m1.637s  | 280       |
>> >>> 0m20.786s     | 0m5.229s  | 368841       | 368009   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | soplex        | 30       | 0m4.849s    | 0m4.848s  | 24        |
>> >>> 7m28.151s     | 4m10.813s | 1244775      | 1242063  |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | sphinx        | 26       | 0m2.212s    | 0m2.198s  | 5         |
>> >>> 1m36.291s     | 0m13.811s | 543534       | 543358   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | streamcluster | 21121728 | 0m0.947s    | 0m0.908s  | 33        |
>> >>> 0m50.212s     | 0m5.986s  | 191981       | 185438   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | swaptions     | 20655    | 0m0.965s    | 0m0.950s  | 13        |
>> >>> 0m0.263s      | 0m0.178s  | 193841       | 184274   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | h264ref       | 24130    | 0m4.278s    | 0m4.272s  | 76        |
>> >>> 3m26.701s     | 3m4.461s  | 816660       | 812396   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | lbm           | 8        | 0m0.824s    | 0m0.815s  | 5         |
>> >>> 6m29.685s     | 1m39.180s | 150871       | 150327   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>> | namd          | 59598954 | 0m4.124s    | 0m4.139s  | 43        |
>> >>> 18m36.447s    | 6m50.288s | 925863       | 925271   |
>> >>>
>> >>>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+
>> >>>
>> >>>
>> >>>
>> >>> > > Open Issues :
>> >>> > > + Update PathProfileInfo on CFG transformations ?
>> >>>
>> >>> > Could you clarify what this means?
>> >>>
>> >>> Changing the control flow graph of a routine may invalidate collected
>> >>> path
>> >>> profiles. For example, splitting a block with an unconditional branch
>> >>> does
>> >>> not change the profile, but introducing a conditional branch
>> >>> invalidates the
>> >>> profile. The issue I would like to address is which transformations
>> >>> should
>> >>> we allow as safe transformations and how should we update the internal
>> >>> path
>> >>> profile data structures if we allow this at all.
>> >>>
>> >>> > > + Verify with PGOEdge info ?
>> >>>
>> >>> > Ditto.
>> >>>
>> >>> Verification with PGOEdge info implies that the edge frequencies
>> >>> derived
>> >>> from path profiles and via instrprof should be equal.
>> >>>
>> >>> > > + Handle setjmp, longjmp, early program termination, noreturn
>> >>> > > calls
>> >>>
>> >>> > How do you handle indirect calls?
>> >>>
>> >>> No special handling of indirect calls as path profiles are
>> >>> intra-procedural and control returns to same basic block
>> >>> after call in the general case. For the above mentioned cases, control
>> >>> may
>> >>> not return.
>> >>>
>> >>>
>> >>> Regards,
>> >>> Snehasish
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> llvm-dev at lists.llvm.org
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


More information about the llvm-dev mailing list