[llvm-dev] [RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data

Fri Aug 14 17:13:40 PDT 2020

> On Aug 14, 2020, at 4:25 PM, Snehasish Kumar <snehasishk at google.com> wrote:
> 
> 
> 
> On Fri, Aug 14, 2020 at 2:22 PM Vedant Kumar <vedant_kumar at apple.com <mailto:vedant_kumar at apple.com>> wrote:
> Hi Snehasish,
> 
> Thanks for sharing this great write-up.
> 
> In past experiments, I found that splitting at the IR level (vs. at the codegen level) has a few drawbacks. You've discussed these, but just to recap:
> 
> - IR-level splitting is necessarily more conservative. This is to avoid inadvertently increasing code size in the original function due to the cost of materializing inputs/outputs to the cold callee (you refer to this as 'residue'). (As Aditya pointed out, some of the work re: splitting penalty hasn't been upstreamed yet -- I'll try to dust this off.)
> - It cannot split out @eh.typeid.for intrinsics, a common feature in exception handling code (llvm.org/PR39545 <http://llvm.org/PR39545>).
> - It must be run very late in the pipeline. Splitting either before inlining, or shortly after, appears to regress performance across spec variants (with/without pgo, see D58258).
> 
> It's exciting to see this work, as it can side-step the first two issues. I'm not sure whether this is on your radar, but we do see some benefit from having the machine outliner run after IR-level splitting. I have not done a study of how this compares with partial inlining + late splitting, that could be interesting future work.
> That's an interesting idea. Can you share more details about the benefit - is this a performance improvement you observe or better code size? Also what benchmarks was this observed on?

I have not done a rigorous analysis of this effect. It's something I observed while debugging an internal app. The situation was:

- The app built with -Os with HCS enabled, causing many substantially similar __assert_fail paths to be split out.
- The (arm64) machine outliner found shared code sequences in the split functions, as they were marked `minsize`, and could merge them.

If it's possible to use the machine outliner at a block-level granularity (not just on functions marked `minsize`), it may be possible to get the same effect with late (post-IR) splitting.

> 
> I haven't kept up to date on the work done to support basic block sections in debug info. I'll note that it was a challenge to fix up debug info after IR-level splitting (ultimately this was handled in D72795). I'm not sure whether any of that is transferable, but feel free to cc me on reviews if you think it is.
> We do have support for debug info in basic block sections (added in https://reviews.llvm.org/D78851 <https://reviews.llvm.org/D78851>). As we test more applications we are fixing up issues (eg. https://reviews.llvm.org/D85085 <https://reviews.llvm.org/D85085>). Thanks for the offer and we will ping you if appropriate.
> 
> vedant

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200814/fc5f1d22/attachment.html>