[llvm-dev] RFC: Shrink wrapping vs SplitCSR

Thu May 4 12:19:25 PDT 2017

Just FYI:
You can use Herald (https://reviews.llvm.org/herald/) to configure rules
that auto-add you as a subscriber/reviewer to various changes.
Pretty much anything you can query you can do :)

On Thu, May 4, 2017 at 11:45 AM, Kit Barton via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Francis Visoiu Mistrih <fvisoiumistrih at apple.com> writes:
>
>
> I'm resending this as I forgot to CC llvm-dev on my reply last night....
>
>
> Hi Francis,
>
> Thanks for the detailed reply!
> I took a quick look at the patch, and I'll try to take a closer look at
> it tomorrow.
>
> Could you please subscribe me to future patches? Either myself or
> someone from my team can help with testing and/or integration on PPC if
> you'd like. This is something we've been struggling with for a while and
> are anxious to make some progress on it :)
>
> Kit
>
> > Hi Kit,
> >
> >> On May 2, 2017, at 7:54 PM, Kit Barton via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >> Hi all,
> >>
> >> We've seen several examples recently of performance opportunities on
> >> POWER if we can improve the location of save/restore code for
> >> callee-saved registers. Both Nemanja and myself have discussed this with
> >> several people, and it seems that there are two possibilities for
> >> improving this:
> >>
> >>  1. Extend shrink wrapping to make the analysis of callee-saved
> >>     registers more precise.
> >>  2. Focus on enabling and (possibly) improving SplitCSR.
> >>
> >> I would like opinions from people on the preferred way to proceed.
> >>
> >> I am leaning toward improving shrink wrapping, at least as a short-term
> >> solution. However, I fully admit that this is because I am familiar with
> >> the shrink wrapping code and completely naive about SplitCSR and what
> >> work would be necessary to get this working well.
> >>
> >> My proposal would be to implement the flow sensitive analysis described
> >> by Fred Chow (PLDI '88) and make the necessary extensions in shrink
> >> wrapping to handle multiple save/restore points. At that point we can do
> >> an evaluation to understand the improvements it provides and the impact
> >> on compile time. Once we have these results, we can look at the best way
> >> to enable it (e.g., option, target opt-in, higher opts, etc.).
> >
> > Back in 2009, there was an implementation of Fred Chow’s algorithm, that
> has been removed in r193749, because it was unused and untested.
> >
> > I have been working on a new implementation of Fred Chow’s algorithm for
> a while now.
> >
> > It seems that this algorithm has been avoided because of the compile
> time impact that was not worth compared to the performance improvement.
> >
> > The main reason is that there are probably loops in the CFG. In my
> > implementation, we decided that we never want to save / restore inside a
> loop,
> > so we consider loops as a single block. We are using scc_iterators
> instead of
> > MachineLoopInfo in order to handle irreducible loops as well, that are
> not
> > handled by the current SW implementation. That way, we can compute the
> > anticipation / availability attributes in linear time.
> >
> > In terms of correctness, there is one test on X86 that still fails, and
> two others on AArch64, because of the way compact unwinding encodes
> register saves.
> >
> > In terms of compile-time, there are no regressions on CTMark.
> >
> > In terms of code-size, we get a 0.8% increase on X86_64 and AArch64,
> mostly because we cannot use push / pop anymore.
> >
> > For now, we only worked on shrink-wrapping CSRs, but keep the stack
> setup in the entry block / return blocks, which can give worse results in
> some cases compared to the current shrink-wrapping pass. I am currently
> working on fixing this.
> >
> > For execution-time, not many improvements showed up due to the stack
> setup and the transformation of push/pop -> mov $reg, (mem)/mov (mem), $reg
> which can be partially solved.
> >
> > In terms of what the algorithm can do, and how it can outperform the
> current one, I got some stats based on where we save / restore, along with
> the block frequency (with PGO), and we can see a theoretical 8% improvement.
> >
> > I put an early review here a while ago: https://reviews.llvm.org/D30808,
> and I
> > will update it soon. As you can see, all the PrologEpilogInserter and
> > (X86|AArch64)TargetFrameLowering code look horribly hacky, because so
> many
> > things assume the stack setup and callee saved saves will stick
> together. Fixing
> > all those assumptions is going to be the most tricky part of
> shrink-wrapping,
> > not the algorithm itself.
> >
> > There are some cases where the current shrink-wrapping is better, and
> that’s in
> > cases like in the Fig.3 of Chow’s paper, and that can be probably solved
> by
> > using something similar to what’s described in this paper: Post Register
> > Allocation Spill Code Optimization by Christopher Lupo and Kent D.
> Wilken, which
> > uses the PST to optimize saves / restores placement along with a cost
> model.
> >
> > Cheers,
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170504/b9a0b848/attachment.html>