[llvm-dev] Status update on the hot/cold splitting pass

Tue Feb 5 15:56:13 PST 2019

On Tue, Feb 5, 2019, 3:46 PM Vedant Kumar <vedant_kumar at apple.com> wrote:

> Hi Teresa,
>
> On Feb 5, 2019, at 2:38 PM, Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
> On Mon, Jan 28, 2019 at 11:03 AM Aditya K via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> > The splitting pass currently doesn’t move cold symbols into a separate
>> section. Is that affecting your results?
>> Maybe partly, the main reason is that, in the absence of good profile
>> info, we aren't finding many cold blocks.
>>
>
> We noticed that the split cold functions are ending up in the regular
> .text section instead of .text.unlikely. Since that is done much later than
> splitting and is based on profile counts, it must be the case that profile
> data is not being propagated to the split functions in some way - do you
> know offhand if they are getting function_entry_count prof metadata?
>
>
> At the moment, entry counts are not propagated to the split functions.
> This should explain the behavior you see.
>

Ok, it should be straightforward to add that, will take a look.

>
>
> The other thing we noticed is that the .text.unlikely section is also
> reducing significantly, so it seems like some of the already cold blocks
> are getting split - has anyone noticed that?
>
>
> No, but we’ve focused on marking up select commonly-used APIs cold
> explicitly. The splitting pass skips functions where
> PSI->isFunctionEntryCold() holds — maybe a stronger check is necessary?
>

Yeah I'm not sure. The cold section assignment uses a slightly different
PSI interface, isFunctionColdInCallGraph, but that shouldn't be very
different. I'll need to take a closer look later and get back.

Thanks,
Teresa

>
> vedant
>
>
> Teresa
>
>
>> -Aditya
>>
>> ------------------------------
>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar <
>> vedant_kumar at apple.com>
>> *Sent:* Monday, January 28, 2019 1:00 PM
>> *To:* Aditya K
>> *Cc:* llvm-dev at lists.llvm.org; Sebastian Pop
>> *Subject:* Re: [llvm-dev] Status update on the hot/cold splitting pass
>>
>> The splitting pass currently doesn’t move cold symbols into a separate
>> section. Is that affecting your results?
>>
>> On Darwin, we plan on using a symbol attribute to provide an ordering
>> hint to the linker (see r352227, N_COLD_FUNC).
>>
>> vedant
>>
>> On Jan 28, 2019, at 10:51 AM, Aditya K via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Very happy to see good results. On our side, we are still struggling with
>> getting a good profile to get aggressive hot-cold splitting. Static profile
>> isn't helping much for our use cases. I'll be curious to know if someone
>> got good improvements only with static profile analysis.
>>
>>
>> -Aditya
>>
>> ------------------------------
>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar <
>> vedant_kumar at apple.com>
>> *Sent:* Friday, January 25, 2019 6:29 PM
>> *To:* llvm-dev at lists.llvm.org
>> *Cc:* Aditya Kumar; Sebastian Pop; Teresa Johnson; jun.l at samsung.com;
>> Duncan Smith; Gerolf Hoflehner
>> *Subject:* Status update on the hot/cold splitting pass
>>
>> Hello,
>>
>> I’d like to give a status update to the community about the
>> recently-added hot/cold splitting pass. I'll provide some motivation for
>> the pass, describe its implementation, summarize recent/ongoing work, and
>> share early results.
>>
>> # Motivation
>>
>> We (at Apple) have found that memory pressure from resident pages of code
>> is significant on embedded devices. In particular, this pressure spikes
>> during app launches. We’ve been looking into ways to reduce memory
>> pressure. Hot/cold splitting is one part of a solution.
>>
>> # What does hot/cold splitting do?
>>
>> The hot/cold splitting pass identifies cold basic blocks and moves them
>> into separate functions. The linker must order newly-created cold functions
>> away from the rest of the program (say, into a cold section). The idea here
>> is to have these cold pages faulted in relatively infrequently (if at all),
>> and to improve the memory locality of code outside of the cold area.
>>
>> The pass considers profile data, traps, uses of the `cold*`* attribute,
>> and exception-handling code to identify cold blocks. If the pass identifies
>> a cold region that's profitable to extract, it uses LLVM's CodeExtractor
>> utility to split the region out of its original function. Newly-created
>> cold functions are marked `minsize` (-Oz). The splitting process may occur
>> multiple times per function.
>>
>> The choice to perform splitting at the IR level gave us a lot of
>> flexibility. It allowed us to quickly target different architectures and
>> evaluate new phase orderings. It also made it easier to split out highly
>> complex subgraphs of CFGs (with both live-ins and live-outs). One
>> disadvantage is that we cannot easily split out EH pads (llvm.org/PR39545).
>> However, our experiments show that doing so only increases the total amount
>> of split code by 2% across the entire iOS shared cache.
>>
>> # Recent/ongoing work
>>
>> Aditya and Sebastian contributed the hot/cold splitting pass in September
>> 2018 (r341669). Since then, work on the pass has continued steadily. It
>> gained the ability to extract larger cold regions (r345209), compile-time
>> improvements (r351892, r351894), and a more effective cost model (r352228).
>> With some experimentation, we found that scheduling splitting before
>> inlining gives better code size results without regressing memory locality
>> (r352080). Along the way, CodeExtractor got better at handling debug info
>> (r344545, r346255), and a few other issues in this utility were fixed
>> (r348205, r350420).
>>
>> At this point, we're able to build & run our software stack with hot/cold
>> splitting enabled. We’d like to introduce a CC1 option to safely toggle
>> splitting on/off (https://reviews.llvm.org/D57265). That would help
>> experiment with and/or deploy the pass.
>>
>> # Early results
>>
>> On internal memory benchmarks, we consistently saw that code page faults
>> were more concentrated with splitting enabled. With splitting, the set of
>> the most-frequently-accessed 95% (99%) of code pages was 10% (resp. 3.6%)
>> smaller. We used a facility in the xnu VM to force pages to be faulted
>> periodically, and ktrace, to collect this data. We settled on this approach
>> because the alternatives (e.g. directly sampling RSS of various processes)
>> gave unstable results, even when measures were taken to stabilize a device
>> (e.g. disabling dynamic frequency switching, SMP, and various other
>> features).
>>
>> On arm64, the performance impact of enabling splitting in the LLVM test
>> suite appears to be in the noise. We think this is because split code
>> amount to just 0.1% of all the code in the test suite. Across the iOS
>> shared cache we see that 0.9% of code is split, with higher percentages in
>> key frameworks (e.g. 7% in libdispatch). For three internal benchmarks, we
>> see geomean score improvements of 1.58%, 0.56%, and 0.27% respectively. We
>> think these results are promising. I’d like to encourage others to evaluate
>> the pass and share results.
>>
>> Thanks!
>>
>> vedant
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190205/1c40cca0/attachment.html>