[llvm-dev] [RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

Kyungwoo Lee via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 23 23:32:01 PDT 2020


Hello,

I'd like to upstream our work over the time which the community would
benefit from.
This is a part of effort toward minimizing code size presented in here
<https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>.
In particular, this RFC is about optimizing prolog and epilog for size.

*Homogeneous Prolog and Epilog for Size Optimization, D76570
<https://reviews.llvm.org/D76570>:*

Prolog and epilog to handle callee-save registers tend to be irregular with
different immediate offsets, which are not often being outlined (by machine
outliner) when optimizing for size. From D18619, combining stack operations
stretched irregularity further.
This patch tries to emit homogeneous stores and loads with the same offset
for prolog and epilog respectively.  We have observed that this homogeneous
prolog and epilog significantly increased the chance of outlining,
resulting in a code size reduction. However, there were still a great deal
of outlining opportunities left because the current outliner had to
conservatively handle instructions with the return register, x30.
Rather, this patch also forms a custom-outlined helper function on demand
for prolog and epilog when lowering the frame code.

- Injects HOM_Prolog and HOM_Epilog pseudo instructions in Prolog and
Epilog Injection Pass
- Lower and optimize them in AArchLowerHomogneousPrologEpilog Pass
- Outlined helpers are created on demand. Identical helpers are merged by
the linker.
- An opt-in flag is introduced to enable this feature. Another threshold
flag is also introduced to control the aggressiveness of outlining for
application's need.

This reduced an average of 4% of code size for LLVM-TestSuite/CTMark
targeting arm64/-Oz. In a large mobile application, the size benefit was
even larger reducing the page-faults as well.

*Design Alternatives:*

1. Expand helpers eagerly by permuting all cases in an earlier module pass.
Even though this is rather simple and less invasive, it creates many
redundant helpers which need to be elided by the linker.
2. Turn Prolog-Epilog-Injection into a module pass. Need to plumb the
module through the pass and architecture specific frame-lowering. Not sure
about other architecture interaction with this module pass.
3. Runtime/compiler-rt for all helpers. The combinations of helpers are a
lot and certainly this approach is not flexible.

Regards,
Kyungwoo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200323/464cfddf/attachment.html>


More information about the llvm-dev mailing list