[llvm-dev] [RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Kyungwoo Lee via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 24 14:04:01 PDT 2020
Thanks for your interest and comment.
Size-optimization improves page-faults and a start-up time for a large
application, which this enabling also followed.
Even though I didn't see a large regression/complaint on a CPU-bound case,
which is not a typical case for mobile workload, I wanted to be precautious
of enabling it by default.
However, as with default outlining case, I don't mind enabling this under
-Oz (for minimizing code) with an opt-out option.
On Tue, Mar 24, 2020 at 12:01 PM Vedant Kumar <vedant_kumar at apple.com>
> This looks really interesting. In the slides, it’s mentioned that the
> combination of tuning the MachineOutliner for ThinLTO and of optimizing
> function prolog/epilogs improved measured run-time performance.
> What kind of performance impact do you see from simply homogenizing
> prolog/epilogs? (If, say across LNT/aarch64/-Oz the performance impact is
> not large, it may make sense to have homogenization enabled by default.)
> On Mar 23, 2020, at 11:32 PM, Kyungwoo Lee via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> I'd like to upstream our work over the time which the community would
> benefit from.
> This is a part of effort toward minimizing code size presented in here
> In particular, this RFC is about optimizing prolog and epilog for size.
> *Homogeneous Prolog and Epilog for Size Optimization, D76570
> Prolog and epilog to handle callee-save registers tend to be irregular
> with different immediate offsets, which are not often being outlined (by
> machine outliner) when optimizing for size. From D18619, combining stack
> operations stretched irregularity further.
> This patch tries to emit homogeneous stores and loads with the same offset
> for prolog and epilog respectively. We have observed that this homogeneous
> prolog and epilog significantly increased the chance of outlining,
> resulting in a code size reduction. However, there were still a great deal
> of outlining opportunities left because the current outliner had to
> conservatively handle instructions with the return register, x30.
> Rather, this patch also forms a custom-outlined helper function on demand
> for prolog and epilog when lowering the frame code.
> - Injects HOM_Prolog and HOM_Epilog pseudo instructions in Prolog and
> Epilog Injection Pass
> - Lower and optimize them in AArchLowerHomogneousPrologEpilog Pass
> - Outlined helpers are created on demand. Identical helpers are merged by
> the linker.
> - An opt-in flag is introduced to enable this feature. Another threshold
> flag is also introduced to control the aggressiveness of outlining for
> application's need.
> This reduced an average of 4% of code size for LLVM-TestSuite/CTMark
> targeting arm64/-Oz. In a large mobile application, the size benefit was
> even larger reducing the page-faults as well.
> *Design Alternatives:*
> 1. Expand helpers eagerly by permuting all cases in an earlier module
> pass. Even though this is rather simple and less invasive, it creates many
> redundant helpers which need to be elided by the linker.
> 2. Turn Prolog-Epilog-Injection into a module pass. Need to plumb the
> module through the pass and architecture specific frame-lowering. Not sure
> about other architecture interaction with this module pass.
> 3. Runtime/compiler-rt for all helpers. The combinations of helpers are a
> lot and certainly this approach is not flexible.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev