[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Mon Jul 31 10:13:28 PDT 2017

Hi Evgeny,

Absolutely not at all. I think it's exciting that everyone wants to use the
outlining support :)

-eric

On Mon, Jul 31, 2017 at 10:12 AM Evgeny Astigeevich <
Evgeny.Astigeevich at arm.com> wrote:

> Hi Eric,
>
>
> Thank you for feedback. I must apologise if I have caused any concern or
> offence.
>
>
> -Evgeny
> ------------------------------
> *From:* Eric Christopher <echristo at gmail.com>
> *Sent:* Monday, July 31, 2017 5:57:01 PM
> *To:* Evgeny Astigeevich; River Riddle; Chris Bieneman
> *Cc:* llvm-dev; nd
>
> *Subject:* Re: [llvm-dev] [RFC] Add IR level interprocedural outliner for
> code size.
> Hi Evgeny,
>
> On Mon, Jul 31, 2017 at 8:47 AM Evgeny Astigeevich via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Chris,
>>
>>
>>
>> > One particular disagreement that I think very much needs to be
>> revisited in this thread was Jessica's proposal of a pipeline of:
>>
>> > 1. IR outline
>>
>> > 2. Inline
>>
>> > 3. MIR outline
>>
>>
>>
>> IMHO, there is no need to restrict a place of the Outliner in the
>> pipeline at the moment. I hope people representing different architectures
>> will try different configurations and the best will be chosen. I’d like to
>> try the pipeline configuration:
>>
>>
>>
>
> This is largely irrelevant to the discussion at hand. The original
> thoughts were about one or the other and Jessica has rightly pointed out
> (which you seem to agree with) that there's room for both in the pipeline.
>
> Thanks.
>
> -eric
>
>
>> 1.       Inline
>>
>> 2.       IR optimizations
>>
>> 3.       IR outline
>>
>> 4.       MIR optimizations
>>
>> 5.       MIR outline
>>
>>
>>
>> I think this configuration allows to apply as many IR optimizations,
>> especially those which reduce code size, as possible and then extract
>> commonly used code into functions. I am also interested in some kind of Oz
>> LTO with the IR Outliner enabled.
>>
>>
>>
>> Evgeny Astigeevich
>>
>> Senior Compiler Engineer
>>
>> Compilation Tools
>>
>> ARM
>>
>>
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *River
>> Riddle via llvm-dev
>> *Sent:* Saturday, July 29, 2017 6:33 AM
>> *To:* Chris Bieneman
>> *Cc:* llvm-dev
>>
>>
>> *Subject:* Re: [llvm-dev] [RFC] Add IR level interprocedural outliner
>> for code size.
>>
>>
>>
>> Hi Chris,
>>
>>
>>
>> It's okay to put this on the spot because posting the patches was meant
>> to help further the discussion that kind of stalled previously.
>>
>>
>>
>> On Fri, Jul 28, 2017 at 9:58 PM, Chris Bieneman <beanz at apple.com> wrote:
>>
>> Apologies for delayed joining of this discussion, but I had a few notes
>> from this thread that I really wanted to chime in about.
>>
>>
>>
>> River,
>>
>>
>>
>> I don't mean to put you on the spot, but I do want to start on a semantic
>> issue. In several places in the thread you used the words "we" and "our" to
>> imply that you're not alone in writing this (which is totally fine), but
>> your initial thread presented this as entirely your own work. So, when you
>> said things like "we feel there's an advantage to being at the IR level",
>> can you please clarify who is "we"?
>>
>>
>>
>>  In regards to the words "we" and "our", I am referring to myself. My
>> writing style tends to shift between the usage of those words. I'll avoid
>> any kind of confusion in the future.
>>
>>
>>
>>
>>
>> Given that there are a number of disagreements and opinions floating
>> around I think it benefits us all to speak clearly about who is taking what
>> stances.
>>
>>
>>
>> One particular disagreement that I think very much needs to be revisited
>> in this thread was Jessica's proposal of a pipeline of:
>>
>>    1. IR outline
>>    2. Inline
>>    3. MIR outline
>>
>> In your response to that proposal you dismissed it out of hand with
>> "feelings" but not data. Given that the proposal came from Jessica (a
>> community member with significant relevant experience in outlining), and it
>> was also recognized as interesting by Eric Christopher (a long-time member
>> of the community with wide reaching expertise), I think dismissing it may
>> have been a little premature.
>>
>>
>>
>> I dismissed the idea of an outliner at the machine level being able to
>> catch bad inlining decisions. Given the loss of information between the two
>> I felt it was a little optimistic to rely on a very late pass being able to
>> reverse those decisions, especially coupled with the fact that the current
>> machine outliner requires exact equivalence. I don't disagree with the
>> proposal of an example : outline, inline, outline: pipeline, but the idea
>> of being able to catch inlining decisions given the circumstances seemed
>> optimistic to me. From there I went ahead and implemented a generic
>> interface for outlining that can be shared between IR/Machine level so that
>> such a pipeline could be more feasible.
>>
>>
>>
>>
>> I also want to visit a few procedural notes.
>>
>>
>>
>> Mehdi commented on the thread that it wouldn't be fair to ask for a
>> comparative study because the MIR outliner didn't have one. While I don't
>> think anyone is asking for a comparative study, I want to point out that I
>> think it is completely fair. If a new contributor approached the community
>> with a new SROA pass and wanted to land it in-tree it would be appropriate
>> to ask for a comparative analysis against the existing pass. How is this
>> different?
>>
>>
>>
>> The real question comes from what exactly you want to define as a
>> "comparative analysis". When posting the patch I included additional
>> performance data( found here goo.gl/5k6wsP) that includes benchmarking
>> and comparisons between the outliner that I am proposing and the machine
>> outliner on a wide variety of benchmarks. The proposed outliner performs
>> quite favorable in comparison. As for feature comparison, the proposed
>> outliner has many features currently missing from the machine outliner:
>>
>>  - parameterization
>>
>>  - outputs
>>
>>  - relaxed equivalence(machine outliner requires exact)
>>
>>  - usage of profile data
>>
>>  - support for opt remarks
>>
>>
>>
>>  The machine outliner currently only supports X86 and AArch64, the IR
>> outliner can/should support all targets immediately without the requirement
>> of ABI restrictions(mno-red-zone is required for the machine outliner).
>>
>>  At the IR level we have much more opportunity to find congruent
>> instructions than at the machine level given the possible variation at that
>> level: RA, instruction selection, instruction scheduling, etc.
>>
>>
>>
>>
>>
>> I am more than willing to do a comparative analysis but I'm not quite
>> sure what the expectation for one is.
>>
>>
>>
>>
>>
>> Adding a new IR outliner is a different situation from when the MIR one
>> was added. When the MIR outliner was introduced there was no in-tree
>> analog. When someone comes to the community with something that has no
>> existing in-tree analog it isn't fair to necessarily ask them to implement
>> it multiple different ways to prove their solution is the best. However, as
>> a community, we do still exercise the right to reject contributions we
>> disagree with, and we frequently request changes to the implementation (as
>> is shown every time someone tries to add SPIR-V support).
>>
>>
>>
>> I perfectly agree :)
>>
>>
>>
>>
>>
>> In the LLVM community we have a long history of approaching large
>> contributions (especially ones from new contributors) with scrutiny and
>> discussion. It would be a disservice to the project to forget that.
>>
>>
>>
>> River, as a last note. I see that you've started uploading patches to
>> Phabricator, and I know you're relatively new to the community. When
>> uploading patches it helps to include appropriate reviewers so that the
>> right people see the patches as they come in. To that end can you please
>> include Jessica as a reviewer? Given her relevant domain experience I think
>> her feedback on the patches will be very valuable.
>>
>>
>>
>> I accidentally posted without any reviewers at first, I've been going
>> back through and adding people I missed.
>>
>>
>>
>>
>>
>> Thank you,
>>
>> -Chris
>>
>> I appreciate the feedback and welcome all critical discussion about the
>> right way to move forward.
>>
>> Thanks,
>>
>>  River Riddle
>>
>>
>>
>>
>>
>> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>> Hey Sanjoy,
>>
>>
>>
>> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>> > The way I interpret Quentin's statement is something like:
>> >
>> > - Inlining turns an interprocedural problem into an intraprocedural
>> problem
>> > - Outlining turns an intraprocedural problem into an interprocedural
>> problem
>> >
>> > Insofar as our intraprocedural analyses and transformations are strictly
>> > more powerful than interprocedural, then there is a precise sense in
>> which
>> > inlining exposes optimization opportunities while outlining does not.
>>
>> While I think our intra-proc optimizations are *generally* more
>> powerful, I don't think they are *always* more powerful.  For
>> instance, LICM (today) won't hoist full regions but it can hoist
>> single function calls.  If we can extract out a region into a
>> readnone+nounwind function call then LICM will hoist it to the
>> preheader if the safety checks pass.
>>
>> > Actually, for his internship last summer River wrote a profile-guided
>> > outliner / partial inliner (it didn't try to do deduplication; so it was
>> > more like PartialInliner.cpp). IIRC he found that LLVM's interprocedural
>> > analyses were so bad that there were pretty adverse effects from many
>> of the
>> > outlining decisions. E.g. if you outline from the left side of a
>> diamond,
>> > that side basically becomes a black box to most LLVM analyses and forces
>> > downstream dataflow meet points to give an overly conservative result,
>> even
>> > though our standard intraprocedural analyses would have happily dug
>> through
>> > the left side of the diamond if the code had not been outlined.
>> >
>> > Also, River's patch (the one in this thread) does parameterized
>> outlining.
>> > For example, two sequences containing stores can be outlined even if the
>> > corresponding stores have different pointers. The pointer to be loaded
>> from
>> > is passed as a parameter to the outlined function. In that sense, the
>> > outlined function's behavior becomes a conservative approximation of
>> both
>> > which in principle loses precision.
>>
>> Can we outline only once we've already done all of these optimizations
>> that outlining would block?
>>
>>
>>
>>   The outliner is able to run at any point in the interprocedural
>> pipeline. There are currently two locations: Early outlining(pre inliner)
>> and late outlining(practically the last pass to run). It is configured to
>> run either Early+Late, or just Late.
>>
>>
>>
>>
>> > I like your EarlyCSE example and it is interesting that combined with
>> > functionattrs it can make a "cheap" pass get a transformation that an
>> > "expensive" pass would otherwise be needed. Are there any cases where we
>> > only have the "cheap" pass and thus the outlining would be essential
>> for our
>> > optimization pipeline to get the optimization right?
>> >
>> > The case that comes to mind for me is cases where we have some cutoff of
>> > search depth. Reducing a sequence to a single call (+ functionattr
>> > inference) can essentially summarize the sequence and effectively
>> increase
>> > search depth, which might give more results. That seems like a bit of a
>> weak
>> > example though.
>>
>> I don't know if River's patch outlines entire control flow regions at
>> a time, but if it does then we could use cheap basic block scanning
>> analyses for things that would normally require CFG-level analysis.
>>
>>
>>
>>   The current patch currently just supports outlining from within a
>> single block. Although, I had a working prototype for Region based
>> outlining, I kept it from this patch for simplicity. So its entirely
>> possible to add that kind of functionality because I've already tried.
>>
>> Thanks,
>>
>>   River Riddle
>>
>>
>>
>>
>> -- Sanjoy
>>
>>
>> >
>> > -- Sean Silva
>> >
>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> > No, I mean in terms of enabling other optimizations in the pipeline
>> like
>> >> > vectorizer. Outliner does not expose any of that.
>> >>
>> >> I have not made a lot of effort to understand the full discussion here
>> (so
>> >> what
>> >> I say below may be off-base), but I think there are some cases where
>> >> outlining
>> >> (especially working with function-attrs) can make optimization easier.
>> >>
>> >> It can help transforms that duplicate code (like loop unrolling and
>> >> inlining) be
>> >> more profitable -- I'm thinking of cases where unrolling/inlining would
>> >> have to
>> >> duplicate a lot of code, but after outlining would require duplicating
>> >> only a
>> >> few call instructions.
>> >>
>> >>
>> >> It can help EarlyCSE do things that require GVN today:
>> >>
>> >> void foo() {
>> >>   ... complex computation that computes func()
>> >>   ... complex computation that computes func()
>> >> }
>> >>
>> >> outlining=>
>> >>
>> >> int func() { ... }
>> >>
>> >> void foo() {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> functionattrs=>
>> >>
>> >> int func() readonly { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> earlycse=>
>> >>
>> >> int func(int t) readnone { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func(a);
>> >>   int y = x;
>> >> }
>> >>
>> >> GVN will catch this, but EarlyCSE is (at least supposed to be!)
>> cheaper.
>> >>
>> >>
>> >> Once we have an analysis that can prove that certain functions can't
>> trap,
>> >> outlining can allow LICM etc. to speculate entire outlined regions out
>> of
>> >> loops.
>> >>
>> >>
>> >> Generally, I think outlining exposes information that certain regions
>> of
>> >> the
>> >> program are doing identical things.  We should expect to get some
>> mileage
>> >> out of
>> >> this information.
>> >>
>> >> -- Sanjoy
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170731/4c8a41c2/attachment.html>