[llvm-dev] [EXTERNAL] Re: exponential code explosion during inlining

Wed Jun 9 13:41:54 PDT 2021

On Wed, Jun 9, 2021 at 1:35 PM Hongtao Yu <hoy at fb.com> wrote:

> There were some offline discussions about implementing the global cap
> based on an interprocedural loop-graph (ILG) based working set analysis.
> David, I’m wondering if you could share the progress of the ILG work.
> Thanks.
>

ILG based cap will be used for performance tuning and it won't be global
but per code-reuse region. To handle pathological code growth, a
different/simpler capping mechanism is probably better.

> <davidxl at google.com>
>
>
>
> *From: *Joseph Tremoulet <jotrem at microsoft.com>
> *Date: *Wednesday, June 9, 2021 at 1:00 PM
> *To: *Hongtao Yu <hoy at fb.com>, Xinliang David Li <xinliangli at gmail.com>
> *Cc: *llvm-dev <llvm-dev at lists.llvm.org>, Wenlei He <wenlei at fb.com>
> *Subject: *RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion
> during inlining
>
> Yes, a global cap would certainly be a big enough hammer to prevent the
> code explosion I’m seeing.  I didn’t follow the review closely enough to
> get a sense, is that likely to land?
>
>
>
> *From:* Hongtao Yu <hoy at fb.com>
> *Sent:* Wednesday, June 9, 2021 1:14 PM
> *To:* Joseph Tremoulet <jotrem at microsoft.com>; Xinliang David Li <
> xinliangli at gmail.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>
> *Subject:* Re: [EXTERNAL] Re: [llvm-dev] exponential code explosion
> during inlining
>
>
>
> The existing implementation has a per-callsite a size cap. For a callsite,
> if the inlining could cause a potential size growth to exceed the given cap
> (3000 for hot callsites, 45 for cold callsites, 375 for warm callsites,
> IIRC), the inlining will not be done. In https://reviews.llvm.org/D98481
> <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem@microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366647435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=+xCY5aNbb+ZEYbwTNE1Ms5OZLIa6pnN/y3hJFeFfr3g=&reserved=0>,
> we were thinking about enforcing a global size cap per each inliner. Give
> the current callsite considered to be inlined, if the size growth makes the
> inliner size over a given cap, the current inlining and all subsequent
> inlining will be skipped. Could this approach solve your problem?
>
>
>
> Thanks,
>
> Hongtao
>
>
>
> *From: *Joseph Tremoulet <jotrem at microsoft.com>
> *Date: *Wednesday, June 9, 2021 at 9:42 AM
> *To: *Xinliang David Li <xinliangli at gmail.com>
> *Cc: *llvm-dev <llvm-dev at lists.llvm.org>, Hongtao Yu <hoy at fb.com>, Wenlei
> He <wenlei at fb.com>
> *Subject: *RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion
> during inlining
>
> Related, yes.  I see the conversation there leading toward having a cap on
> inlinee size (because, at leach level, there is a single huge function,
> which will be inlined twice into the single function in the next level).  I
> was mistakenly under the impression that our existing threshold effectively
> imposed a size cap (and therefore effectively imposed a fan-out cap), so
> one avenue I was considering was limiting the depth of transitive inlines
> through calls copied from inlinees in already-converged SCCs.  But adding
> the depth cap could make cases such as I’m hitting look a lot like the case
> that https://reviews.llvm.org/D98481
> <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem@microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366657378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0>
> describes.  So I think that neither the size cap nor the depth cap alone
> would be sufficient…
>
>
>
> *From:* Xinliang David Li <xinliangli at gmail.com>
> *Sent:* Tuesday, June 8, 2021 8:03 PM
> *To:* Joseph Tremoulet <jotrem at microsoft.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Hongtao Yu <hoy at fb.com>; Wenlei
> He <wenlei at fb.com>
> *Subject:* [EXTERNAL] Re: [llvm-dev] exponential code explosion during
> inlining
>
>
>
> May be related to https://reviews.llvm.org/D98481
> <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem@microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366657378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0>
>
>
>
> David
>
>
>
> On Tue, Jun 8, 2021 at 8:43 AM Joseph Tremoulet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
>
>
> I filed a bug (PR50485 [1]) a couple weeks ago for some pathological
> behavior we’ve hit in the inliner, but there hasn’t been any reply on the
> bug so I figured I’d broaden the audience and ask about the issue here.
>
>
>
> The problem is partially a phase-ordering issue – we run cleanup
> optimizations after inlining an SCC that reduce the size of functions in
> the SCC, so a given call foo -> bar that looks unprofitable due to bar’s
> size while processing the SCC containing foo and bar may suddenly look
> profitable due to bar’s reduced size when later considering a copy of that
> same callsite that gets created by inlining foo into some function in a
> later SCC.
>
>
>
> This interacts badly with the approach that the inliner relies on local
> heuristics to eventually converge (rather than limiting itself with some
> global budget).  I’ll copy the comment explaining that approach here:
>
>
>
>   // We use a single common worklist for calls across the entire SCC. We
>
>   // process these in-order and append new calls introduced during
> inlining to
>
>   // the end.
>
>   //
>
>   // Note that this particular order of processing is actually critical to
>
>   // avoid very bad behaviors. Consider *highly connected* call graphs
> where
>
>   // each function contains a small amount of code and a couple of calls to
>
>   // other functions. Because the LLVM inliner is fundamentally a bottom-up
>
>   // inliner, it can handle gracefully the fact that these all appear to be
>
>   // reasonable inlining candidates as it will flatten things until they
> become
>
>   // too big to inline, and then move on and flatten another batch.
>
>   //
>
>   // However, when processing call edges *within* an SCC we cannot rely on
> this
>
>   // bottom-up behavior. As a consequence, with heavily connected *SCCs* of
>
>   // functions we can end up incrementally inlining N calls into each of
>
>   // N functions because each incremental inlining decision looks good and
> we
>
>   // don't have a topological ordering to prevent explosions.
>
>   //
>
>   // To compensate for this, we don't process transitive edges made
> immediate
>
>   // by inlining until we've done one pass of inlining across the entire
> SCC.
>
>   // Large, highly connected SCCs still lead to some amount of code bloat
> in
>
>   // this model, but it is uniformly spread across all the functions in
> the SCC
>
>   // and eventually they all become too large to inline, rather than
>
>   // incrementally maknig a single function grow in a super linear fashion.
>
>
>
> The problem in a nutshell is that “eventually they all become too large to
> inline” is true while inlining is happening in their SCC, but then the
> cleanup makes them small again and so the inliner goes nuts chasing all the
> now-profitable paths through the highly connected SCC when considering them
> as transitive inlines into a subsequent SCC.
>
>
>
> I’d love some thoughts on how we might best go about addressing this.  I
> could imagine trying to address it as a phase ordering issue by running
> cleanup at the start of inlining an SCC – in the cases where we’ve hit this
> the cleanup hasn’t actually depended on inlining to expose the
> opportunities, it just happened to first get cleaned up immediately post
> inlining.  I could also imagine trying to address it by limiting transitive
> inlines at callsites created by inlining functions from already-converged
> SCCs, which we could either do wholesale (if we’re expecting them to be too
> big to inline at this point, that shouldn’t be a big loss, right?) or just
> by capping their depth, say, to cut off exponential explosion.
>
>
>
>
>
> Thanks,
>
> -Joseph
>
>
>
> [1] - 50485 – Exponential code explosion during inlining (llvm.org)
> <https://nam06.safelinks.protection.outlook.com/?url=https://bugs.llvm.org/show_bug.cgi?id=50485&data=04%7C01%7Cjotrem@microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366667321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=CA1s3Y1qj0sM3T7DgdCIOiM2yV1bg8fE1XnEbr/W2AM=&reserved=0>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <https://nam06.safelinks.protection.outlook.com/?url=https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev&data=04%7C01%7Cjotrem@microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366677276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=l++ALFP37cEG27HSOY/p4oZDFe5jlYHxGYSgeaUaqLc=&reserved=0>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210609/dfb3601e/attachment-0001.html>