<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 22, 2016 at 9:28 PM, Snehasish Kumar <span dir="ltr"><<a href="mailto:ska124@sfu.ca" target="_blank">ska124@sfu.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi David,<br>
<span class=""><br>
> Hi Snehasish, thanks for writing up the proposal.<br>
><br>
> As it stands today, path profiling still has serious scalability issue that<br>
> prevents it from being usable by any optimization passes that may benefit<br>
> from it.<br>
<br>
</span>I agree; it would be an interesting to see how we can reduce the overheads<br>
to bring it within acceptable limits.<br></blockquote><div><br></div><div><br></div><div>One idea to reduce the overhead of instrumentation is to use edge profiling to guide path profile instrumentation. With edge profiling, you can prune perhaps >90% of the paths and focus one profiling potential hot paths.  The downside of this approach it requires additional steps to enable path profiling -- but at least it can be put into a state that is usable.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
> It will be interesting to see how the<br>
> sampling based approach  matches up instrumentation based method in<br>
> detecting hot paths.<br>
<br>
</span>I actually performed some experiments regarding this very issue. I<br>
created an edge<br>
profile from the collected path profile and constructed superblocks<br>
for innermost loops<br>
in 28 applications. I found<br>
<br>
a) in 6 applications superblocks (i.e hot paths constructed from edge<br>
profiles) are<br>
*not present* as executed paths in the collected profile. For example,<br>
in 401.bzip2 70 of 141 inner loop hot paths derived<br>
from edge profiles do not corresponding to actual executed paths.<br>
<br>
b) in 6 applications (overlap of 3 with the previous 6) some hot paths<br>
constructed<br>
from edge profiles are not the highest ranked path. For example, this occurs in<br>
4 out of 32 innermost loops in 458.sjeng.<br>
<br>
The high level reason this happens is due to overlapped paths. I can<br>
go into more details if<br>
there is interest.<br>
<div class="HOEnZb"><div class="h5"><br></div></div></blockquote><div><br></div><div>For edge profiles, this is expected as not all paths are realizable but your data is still interesting. For instance, the bzip2 data shows great potential of using path profile information. The sample based method I mentioned is not about edge profiling/sampling though, but doing sampling of execution paths (with the length of the paths is limited by capacity of the branch record buffer).  </div><div><br></div><div>David</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">
><br>
> Independent of the method used in generating path profile data, your<br>
> proposed work on the path profile info representation and query APIs can be<br>
> shared.<br>
><br>
> thanks,<br>
><br>
> David<br>
><br>
> On Mon, Mar 21, 2016 at 3:07 PM, Snehasish Kumar <<a href="mailto:ska124@sfu.ca">ska124@sfu.ca</a>> wrote:<br>
>><br>
>> Hi<br>
>><br>
>> I am pinging to find out if there is any interest to mentor this<br>
>> proposal for GSoC this year? I've submitted a draft via the GSoC<br>
>> website.<br>
>><br>
>> David, Vedant it would be great if I could get some advice on refining<br>
>> the goals and particulars of the implementation.<br>
>> The version we use internally is not performance oriented and will<br>
>> require refactoring.<br>
>> Here is a link to the draft document [1].<br>
>><br>
>> Thanks,<br>
>> Snehasish<br>
>><br>
>><br>
>> [1]<br>
>> <a href="https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing" rel="noreferrer" target="_blank">https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing</a><br>
>><br>
>><br>
>> On Wed, Mar 16, 2016 at 2:03 PM, Snehasish Kumar <<a href="mailto:ska124@sfu.ca">ska124@sfu.ca</a>> wrote:<br>
>> > Hi David,<br>
>> ><br>
>> >> Are the data below all collected when only one function is picked for<br>
>> >> instrumentation?<br>
>> ><br>
>> > Yes, here is a list of the benchmarks and selected functions.<br>
>> ><br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | blks            | _Z19BlkSchlsEqEuroNoDivfffffif<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | bodytrack       |<br>
>> ><br>
>> > _ZN17ImageMeasurements11InsideErrorERK17ProjectedCylinderRK11BinaryImageRiS6_<br>
>> >                |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | bzip2           | BZ2_compressBlock<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | ferret          | image_segment<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | fluidanimate    | _Z13ComputeForcesv<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | freqmine        |<br>
>> > _Z32FPArray_conditional_pattern_baseIhEiP7FP_treeiiT_<br>
>> >                       |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | gcc             | bitmap_operation<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | hmmer           | P7Viterbi<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | lbm             | LBM_performStreamCollide<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | mcf             | price_out_impl<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | mcf2000         | price_out_impl<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | namd            |<br>
>> > _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded<br>
>> >                       |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | povray          |<br>
>> ><br>
>> > _ZN3povL24All_Sphere_IntersectionsEPNS_13Object_StructEPNS_10Ray_StructEPNS_13istack_structE<br>
>> > |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | sjeng           | gen<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | soplex          | _ZN6soplex9CLUFactor16vSolveUrightNoNZEPdS1_Piid<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | sphinx          | vector_gautbl_eval_logs3<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | streamcluster   | _Z5pgainlP6PointsdPliP17pthread_barrier_t<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | swaptions       | _Z21HJM_Swaption_BlockingPddddddiidS_PS_llii<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> > | h264ref         | dct_luma_16x16<br>
>> >                                           |<br>
>> ><br>
>> > +-----------------+----------------------------------------------------------------------------------------------+<br>
>> ><br>
>> >> Do you have data when such manual selection is not done?<br>
>> ><br>
>> > At the moment, I do not.<br>
>> ><br>
>> >><br>
>> >> thanks,<br>
>> >><br>
>> >> David<br>
>> >><br>
>> >><br>
>> >>><br>
>> >>> numpaths = Number of possible paths<br>
>> >>> epp+compile = Time taken to compute encoding, insert instrumentation<br>
>> >>> and<br>
>> >>> compile to executable<br>
>> >>> compile = Time taken to compile to executable<br>
>> >>> execpaths = Number of paths dynamically executed<br>
>> >>> epp-exec-time = Execution time with instrumentation<br>
>> >>> exec-time = Normal execution time<br>
>> >>> epp-bin-size = Size of instrumented binary in bytes<br>
>> >>> bin-size = Size of binary<br>
>> >>> ** size of shared library in bytes = 598042<br>
>> >>><br>
>> >>><br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | benchmark     | numpaths | epp+compile | compile   | execpaths |<br>
>> >>> epp-exec-time | exec-time | epp-bin-size | bin-size |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | blks          | 2        | 0m1.036s    | 0m1.008s  | 2         |<br>
>> >>> 0m3.643s      | 0m3.205s  | 155931       | 155459   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | bodytrack     | 29       | 0m4.907s    | 0m4.881s  | 5         |<br>
>> >>> 0m14.786s     | 0m1.943s  | 2125256      | 2124224  |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | bzip2         | 60       | 0m1.274s    | 0m1.268s  | 3         |<br>
>> >>> 0m9.441s      | 0m9.624s  | 259125       | 258477   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | ferret        | 360921   | 0m26.208s   | 0m26.102s | 40        |<br>
>> >>> 0m10.342s     | 0m6.224s  | 8342571      | 8338588  |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | fluidanimate  | 384117   | 0m0.895s    | 0m0.869s  | 88        |<br>
>> >>> 0m56.631s     | 0m1.294s  | 202702       | 197878   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | freqmine      | 45       | 0m1.220s    | 0m1.214s  | 18        |<br>
>> >>> 0m22.150s     | 0m5.515s  | 278615       | 277656   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | gcc           | 6026     | 0m31.941s   | 0m31.327s | 125       |<br>
>> >>> 1m30.139s     | 0m36.601s | 6991413      | 6991245  |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | hmmer         | 1882     | 0m3.193s    | 0m3.232s  | 65        |<br>
>> >>> 0m58.911s     | 0m2.474s  | 744510       | 742806   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | mcf           | 230      | 0m0.838s    | 0m0.830s  | 10        |<br>
>> >>> 0m11.097s     | 0m3.074s  | 162680       | 161736   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | mcf2000       | 1155     | 0m0.859s    | 0m0.853s  | 26        |<br>
>> >>> 0m24.169s     | 0m4.625s  | 166092       | 165213   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | povray        | 17       | 0m8.543s    | 0m8.552s  | 4         |<br>
>> >>> 9m24.562s     | 5m39.295s | 2388152      | 2387960  |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | sjeng         | 158740   | 0m1.648s    | 0m1.637s  | 280       |<br>
>> >>> 0m20.786s     | 0m5.229s  | 368841       | 368009   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | soplex        | 30       | 0m4.849s    | 0m4.848s  | 24        |<br>
>> >>> 7m28.151s     | 4m10.813s | 1244775      | 1242063  |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | sphinx        | 26       | 0m2.212s    | 0m2.198s  | 5         |<br>
>> >>> 1m36.291s     | 0m13.811s | 543534       | 543358   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | streamcluster | 21121728 | 0m0.947s    | 0m0.908s  | 33        |<br>
>> >>> 0m50.212s     | 0m5.986s  | 191981       | 185438   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | swaptions     | 20655    | 0m0.965s    | 0m0.950s  | 13        |<br>
>> >>> 0m0.263s      | 0m0.178s  | 193841       | 184274   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | h264ref       | 24130    | 0m4.278s    | 0m4.272s  | 76        |<br>
>> >>> 3m26.701s     | 3m4.461s  | 816660       | 812396   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | lbm           | 8        | 0m0.824s    | 0m0.815s  | 5         |<br>
>> >>> 6m29.685s     | 1m39.180s | 150871       | 150327   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>> | namd          | 59598954 | 0m4.124s    | 0m4.139s  | 43        |<br>
>> >>> 18m36.447s    | 6m50.288s | 925863       | 925271   |<br>
>> >>><br>
>> >>><br>
>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+<br>
>> >>><br>
>> >>><br>
>> >>><br>
>> >>> > > Open Issues :<br>
>> >>> > > + Update PathProfileInfo on CFG transformations ?<br>
>> >>><br>
>> >>> > Could you clarify what this means?<br>
>> >>><br>
>> >>> Changing the control flow graph of a routine may invalidate collected<br>
>> >>> path<br>
>> >>> profiles. For example, splitting a block with an unconditional branch<br>
>> >>> does<br>
>> >>> not change the profile, but introducing a conditional branch<br>
>> >>> invalidates the<br>
>> >>> profile. The issue I would like to address is which transformations<br>
>> >>> should<br>
>> >>> we allow as safe transformations and how should we update the internal<br>
>> >>> path<br>
>> >>> profile data structures if we allow this at all.<br>
>> >>><br>
>> >>> > > + Verify with PGOEdge info ?<br>
>> >>><br>
>> >>> > Ditto.<br>
>> >>><br>
>> >>> Verification with PGOEdge info implies that the edge frequencies<br>
>> >>> derived<br>
>> >>> from path profiles and via instrprof should be equal.<br>
>> >>><br>
>> >>> > > + Handle setjmp, longjmp, early program termination, noreturn<br>
>> >>> > > calls<br>
>> >>><br>
>> >>> > How do you handle indirect calls?<br>
>> >>><br>
>> >>> No special handling of indirect calls as path profiles are<br>
>> >>> intra-procedural and control returns to same basic block<br>
>> >>> after call in the general case. For the above mentioned cases, control<br>
>> >>> may<br>
>> >>> not return.<br>
>> >>><br>
>> >>><br>
>> >>> Regards,<br>
>> >>> Snehasish<br>
>> >>> _______________________________________________<br>
>> >>> LLVM Developers mailing list<br>
>> >>> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
>> >>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>> >><br>
>> >><br>
>> >><br>
>> >> _______________________________________________<br>
>> >> LLVM Developers mailing list<br>
>> >> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
>> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>> >><br>
><br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
</div></div></blockquote></div><br></div></div>