[llvm-dev] Status update on the hot/cold splitting pass

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 5 16:50:34 PST 2019


On Tue, Feb 5, 2019 at 3:56 PM Teresa Johnson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On Tue, Feb 5, 2019, 3:46 PM Vedant Kumar <vedant_kumar at apple.com> wrote:
>
>> Hi Teresa,
>>
>> On Feb 5, 2019, at 2:38 PM, Teresa Johnson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>>
>> On Mon, Jan 28, 2019 at 11:03 AM Aditya K via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> > The splitting pass currently doesn’t move cold symbols into a separate
>>> section. Is that affecting your results?
>>> Maybe partly, the main reason is that, in the absence of good profile
>>> info, we aren't finding many cold blocks.
>>>
>>
>> We noticed that the split cold functions are ending up in the regular
>> .text section instead of .text.unlikely. Since that is done much later than
>> splitting and is based on profile counts, it must be the case that profile
>> data is not being propagated to the split functions in some way - do you
>> know offhand if they are getting function_entry_count prof metadata?
>>
>>
>> At the moment, entry counts are not propagated to the split functions.
>> This should explain the behavior you see.
>>
>
> Ok, it should be straightforward to add that, will take a look.
>
>>
>>
>> The other thing we noticed is that the .text.unlikely section is also
>> reducing significantly, so it seems like some of the already cold blocks
>> are getting split - has anyone noticed that?
>>
>>
>> No, but we’ve focused on marking up select commonly-used APIs cold
>> explicitly. The splitting pass skips functions where
>> PSI->isFunctionEntryCold() holds — maybe a stronger check is necessary?
>>
>
> Yeah I'm not sure. The cold section assignment uses a slightly different
> PSI interface, isFunctionColdInCallGraph, but that shouldn't be very
> different. I'll need to take a closer look later and get back.
>
>
The later checks internal counts which is more precise. Cold entry count
function does not mean the function body is cold.

David




> Thanks,
> Teresa
>
>>
>> vedant
>>
>>
>> Teresa
>>
>>
>>> -Aditya
>>>
>>> ------------------------------
>>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar <
>>> vedant_kumar at apple.com>
>>> *Sent:* Monday, January 28, 2019 1:00 PM
>>> *To:* Aditya K
>>> *Cc:* llvm-dev at lists.llvm.org; Sebastian Pop
>>> *Subject:* Re: [llvm-dev] Status update on the hot/cold splitting pass
>>>
>>> The splitting pass currently doesn’t move cold symbols into a separate
>>> section. Is that affecting your results?
>>>
>>> On Darwin, we plan on using a symbol attribute to provide an ordering
>>> hint to the linker (see r352227, N_COLD_FUNC).
>>>
>>> vedant
>>>
>>> On Jan 28, 2019, at 10:51 AM, Aditya K via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Very happy to see good results. On our side, we are still struggling
>>> with getting a good profile to get aggressive hot-cold splitting. Static
>>> profile isn't helping much for our use cases. I'll be curious to know if
>>> someone got good improvements only with static profile analysis.
>>>
>>>
>>> -Aditya
>>>
>>> ------------------------------
>>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar <
>>> vedant_kumar at apple.com>
>>> *Sent:* Friday, January 25, 2019 6:29 PM
>>> *To:* llvm-dev at lists.llvm.org
>>> *Cc:* Aditya Kumar; Sebastian Pop; Teresa Johnson; jun.l at samsung.com;
>>> Duncan Smith; Gerolf Hoflehner
>>> *Subject:* Status update on the hot/cold splitting pass
>>>
>>> Hello,
>>>
>>> I’d like to give a status update to the community about the
>>> recently-added hot/cold splitting pass. I'll provide some motivation for
>>> the pass, describe its implementation, summarize recent/ongoing work, and
>>> share early results.
>>>
>>> # Motivation
>>>
>>> We (at Apple) have found that memory pressure from resident pages of
>>> code is significant on embedded devices. In particular, this pressure
>>> spikes during app launches. We’ve been looking into ways to reduce memory
>>> pressure. Hot/cold splitting is one part of a solution.
>>>
>>> # What does hot/cold splitting do?
>>>
>>> The hot/cold splitting pass identifies cold basic blocks and moves them
>>> into separate functions. The linker must order newly-created cold functions
>>> away from the rest of the program (say, into a cold section). The idea here
>>> is to have these cold pages faulted in relatively infrequently (if at all),
>>> and to improve the memory locality of code outside of the cold area.
>>>
>>> The pass considers profile data, traps, uses of the `cold*`* attribute,
>>> and exception-handling code to identify cold blocks. If the pass identifies
>>> a cold region that's profitable to extract, it uses LLVM's CodeExtractor
>>> utility to split the region out of its original function. Newly-created
>>> cold functions are marked `minsize` (-Oz). The splitting process may occur
>>> multiple times per function.
>>>
>>> The choice to perform splitting at the IR level gave us a lot of
>>> flexibility. It allowed us to quickly target different architectures and
>>> evaluate new phase orderings. It also made it easier to split out highly
>>> complex subgraphs of CFGs (with both live-ins and live-outs). One
>>> disadvantage is that we cannot easily split out EH pads (
>>> llvm.org/PR39545). However, our experiments show that doing so only
>>> increases the total amount of split code by 2% across the entire iOS shared
>>> cache.
>>>
>>> # Recent/ongoing work
>>>
>>> Aditya and Sebastian contributed the hot/cold splitting pass in
>>> September 2018 (r341669). Since then, work on the pass has continued
>>> steadily. It gained the ability to extract larger cold regions (r345209),
>>> compile-time improvements (r351892, r351894), and a more effective cost
>>> model (r352228). With some experimentation, we found that scheduling
>>> splitting before inlining gives better code size results without regressing
>>> memory locality (r352080). Along the way, CodeExtractor got better at
>>> handling debug info (r344545, r346255), and a few other issues in this
>>> utility were fixed (r348205, r350420).
>>>
>>> At this point, we're able to build & run our software stack with
>>> hot/cold splitting enabled. We’d like to introduce a CC1 option to safely
>>> toggle splitting on/off (https://reviews.llvm.org/D57265). That would
>>> help experiment with and/or deploy the pass.
>>>
>>> # Early results
>>>
>>> On internal memory benchmarks, we consistently saw that code page faults
>>> were more concentrated with splitting enabled. With splitting, the set of
>>> the most-frequently-accessed 95% (99%) of code pages was 10% (resp. 3.6%)
>>> smaller. We used a facility in the xnu VM to force pages to be faulted
>>> periodically, and ktrace, to collect this data. We settled on this approach
>>> because the alternatives (e.g. directly sampling RSS of various processes)
>>> gave unstable results, even when measures were taken to stabilize a device
>>> (e.g. disabling dynamic frequency switching, SMP, and various other
>>> features).
>>>
>>> On arm64, the performance impact of enabling splitting in the LLVM test
>>> suite appears to be in the noise. We think this is because split code
>>> amount to just 0.1% of all the code in the test suite. Across the iOS
>>> shared cache we see that 0.9% of code is split, with higher percentages in
>>> key frameworks (e.g. 7% in libdispatch). For three internal benchmarks, we
>>> see geomean score improvements of 1.58%, 0.56%, and 0.27% respectively. We
>>> think these results are promising. I’d like to encourage others to evaluate
>>> the pass and share results.
>>>
>>> Thanks!
>>>
>>> vedant
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190205/53bb9513/attachment.html>


More information about the llvm-dev mailing list