[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Mon Jul 31 09:55:33 PDT 2017

Hi River,

>> Given that there are a number of disagreements and opinions floating
>> around I think it benefits us all to speak clearly about who is taking what
>> stances.
>>
>> One particular disagreement that I think very much needs to be revisited
>> in this thread was Jessica's proposal of a pipeline of:
>>
>>    1. IR outline
>>    2. Inline
>>    3. MIR outline
>>
>> In your response to that proposal you dismissed it out of hand with
>> "feelings" but not data. Given that the proposal came from Jessica (a
>> community member with significant relevant experience in outlining), and it
>> was also recognized as interesting by Eric Christopher (a long-time member
>> of the community with wide reaching expertise), I think dismissing it may
>> have been a little premature.
>>
>
> I dismissed the idea of an outliner at the machine level being able to
> catch bad inlining decisions. Given the loss of information between the two
> I felt it was a little optimistic to rely on a very late pass being able to
> reverse those decisions, especially coupled with the fact that the current
> machine outliner requires exact equivalence. I don't disagree with the
> proposal of an example : outline, inline, outline: pipeline, but the idea
> of being able to catch inlining decisions given the circumstances seemed
> optimistic to me. From there I went ahead and implemented a generic
> interface for outlining that can be shared between IR/Machine level so that
> such a pipeline could be more feasible.
>

Honestly given that the owner of the outlining code was suggesting this
path, I don't think that without a concrete reason you should unilaterally
make this decision.

>
>
>>
>> I also want to visit a few procedural notes.
>>
>> Mehdi commented on the thread that it wouldn't be fair to ask for a
>> comparative study because the MIR outliner didn't have one. While I don't
>> think anyone is asking for a comparative study, I want to point out that I
>> think it is completely fair. If a new contributor approached the community
>> with a new SROA pass and wanted to land it in-tree it would be appropriate
>> to ask for a comparative analysis against the existing pass. How is this
>> different?
>>
>
> The real question comes from what exactly you want to define as a
> "comparative analysis". When posting the patch I included additional
> performance data( found here goo.gl/5k6wsP) that includes benchmarking
> and comparisons between the outliner that I am proposing and the machine
> outliner on a wide variety of benchmarks. The proposed outliner performs
> quite favorable in comparison. As for feature comparison, the proposed
> outliner has many features currently missing from the machine outliner:
>  - parameterization
>  - outputs
>  - relaxed equivalence(machine outliner requires exact)
>  - usage of profile data
>  - support for opt remarks
>
>  The machine outliner currently only supports X86 and AArch64, the IR
> outliner can/should support all targets immediately without the requirement
> of ABI restrictions(mno-red-zone is required for the machine outliner).
>  At the IR level we have much more opportunity to find congruent
> instructions than at the machine level given the possible variation at that
> level: RA, instruction selection, instruction scheduling, etc.
>

These are all theoretical advantages and quite compelling, however, numbers
are important and I think we should see one.

> In the LLVM community we have a long history of approaching large
>> contributions (especially ones from new contributors) with scrutiny and
>> discussion. It would be a disservice to the project to forget that.
>>
>> River, as a last note. I see that you've started uploading patches to
>> Phabricator, and I know you're relatively new to the community. When
>> uploading patches it helps to include appropriate reviewers so that the
>> right people see the patches as they come in. To that end can you please
>> include Jessica as a reviewer? Given her relevant domain experience I think
>> her feedback on the patches will be very valuable.
>>
>
> I accidentally posted without any reviewers at first, I've been going back
> through and adding people I missed.
>
>
Last I checked you had still not added Jessica here. I think for design and
future decisions here she should be added and be considered one of the
prime reviewers of this effort.

-eric

>
>> Thank you,
>> -Chris
>>
> I appreciate the feedback and welcome all critical discussion about the
> right way to move forward.
> Thanks,
>  River Riddle
>
>
>>
>> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hey Sanjoy,
>>
>> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>> > The way I interpret Quentin's statement is something like:
>>> >
>>> > - Inlining turns an interprocedural problem into an intraprocedural
>>> problem
>>> > - Outlining turns an intraprocedural problem into an interprocedural
>>> problem
>>> >
>>> > Insofar as our intraprocedural analyses and transformations are
>>> strictly
>>> > more powerful than interprocedural, then there is a precise sense in
>>> which
>>> > inlining exposes optimization opportunities while outlining does not.
>>>
>>> While I think our intra-proc optimizations are *generally* more
>>> powerful, I don't think they are *always* more powerful.  For
>>> instance, LICM (today) won't hoist full regions but it can hoist
>>> single function calls.  If we can extract out a region into a
>>> readnone+nounwind function call then LICM will hoist it to the
>>> preheader if the safety checks pass.
>>>
>>> > Actually, for his internship last summer River wrote a profile-guided
>>> > outliner / partial inliner (it didn't try to do deduplication; so it
>>> was
>>> > more like PartialInliner.cpp). IIRC he found that LLVM's
>>> interprocedural
>>> > analyses were so bad that there were pretty adverse effects from many
>>> of the
>>> > outlining decisions. E.g. if you outline from the left side of a
>>> diamond,
>>> > that side basically becomes a black box to most LLVM analyses and
>>> forces
>>> > downstream dataflow meet points to give an overly conservative result,
>>> even
>>> > though our standard intraprocedural analyses would have happily dug
>>> through
>>> > the left side of the diamond if the code had not been outlined.
>>> >
>>> > Also, River's patch (the one in this thread) does parameterized
>>> outlining.
>>> > For example, two sequences containing stores can be outlined even if
>>> the
>>> > corresponding stores have different pointers. The pointer to be loaded
>>> from
>>> > is passed as a parameter to the outlined function. In that sense, the
>>> > outlined function's behavior becomes a conservative approximation of
>>> both
>>> > which in principle loses precision.
>>>
>>> Can we outline only once we've already done all of these optimizations
>>> that outlining would block?
>>>
>>
>>   The outliner is able to run at any point in the interprocedural
>> pipeline. There are currently two locations: Early outlining(pre inliner)
>> and late outlining(practically the last pass to run). It is configured to
>> run either Early+Late, or just Late.
>>
>>
>>> > I like your EarlyCSE example and it is interesting that combined with
>>> > functionattrs it can make a "cheap" pass get a transformation that an
>>> > "expensive" pass would otherwise be needed. Are there any cases where
>>> we
>>> > only have the "cheap" pass and thus the outlining would be essential
>>> for our
>>> > optimization pipeline to get the optimization right?
>>> >
>>> > The case that comes to mind for me is cases where we have some cutoff
>>> of
>>> > search depth. Reducing a sequence to a single call (+ functionattr
>>> > inference) can essentially summarize the sequence and effectively
>>> increase
>>> > search depth, which might give more results. That seems like a bit of
>>> a weak
>>> > example though.
>>>
>>> I don't know if River's patch outlines entire control flow regions at
>>> a time, but if it does then we could use cheap basic block scanning
>>> analyses for things that would normally require CFG-level analysis.
>>>
>>
>>   The current patch currently just supports outlining from within a
>> single block. Although, I had a working prototype for Region based
>> outlining, I kept it from this patch for simplicity. So its entirely
>> possible to add that kind of functionality because I've already tried.
>> Thanks,
>>   River Riddle
>>
>>
>>>
>>> -- Sanjoy
>>>
>>> >
>>> > -- Sean Silva
>>> >
>>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>>> > <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
>>> >> <llvm-dev at lists.llvm.org> wrote:
>>> >> > No, I mean in terms of enabling other optimizations in the pipeline
>>> like
>>> >> > vectorizer. Outliner does not expose any of that.
>>> >>
>>> >> I have not made a lot of effort to understand the full discussion
>>> here (so
>>> >> what
>>> >> I say below may be off-base), but I think there are some cases where
>>> >> outlining
>>> >> (especially working with function-attrs) can make optimization easier.
>>> >>
>>> >> It can help transforms that duplicate code (like loop unrolling and
>>> >> inlining) be
>>> >> more profitable -- I'm thinking of cases where unrolling/inlining
>>> would
>>> >> have to
>>> >> duplicate a lot of code, but after outlining would require duplicating
>>> >> only a
>>> >> few call instructions.
>>> >>
>>> >>
>>> >> It can help EarlyCSE do things that require GVN today:
>>> >>
>>> >> void foo() {
>>> >>   ... complex computation that computes func()
>>> >>   ... complex computation that computes func()
>>> >> }
>>> >>
>>> >> outlining=>
>>> >>
>>> >> int func() { ... }
>>> >>
>>> >> void foo() {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> functionattrs=>
>>> >>
>>> >> int func() readonly { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> earlycse=>
>>> >>
>>> >> int func(int t) readnone { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func(a);
>>> >>   int y = x;
>>> >> }
>>> >>
>>> >> GVN will catch this, but EarlyCSE is (at least supposed to be!)
>>> cheaper.
>>> >>
>>> >>
>>> >> Once we have an analysis that can prove that certain functions can't
>>> trap,
>>> >> outlining can allow LICM etc. to speculate entire outlined regions
>>> out of
>>> >> loops.
>>> >>
>>> >>
>>> >> Generally, I think outlining exposes information that certain regions
>>> of
>>> >> the
>>> >> program are doing identical things.  We should expect to get some
>>> mileage
>>> >> out of
>>> >> this information.
>>> >>
>>> >> -- Sanjoy
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170731/5572a8ff/attachment-0001.html>