[llvm-dev] Code size BoF Minutes

Tue Oct 6 11:54:03 PDT 2020

Hi all,

Thanks to everyone who participated in the code size birds-of-a-feather today! Here are the minutes: https://docs.google.com/document/d/1kDsbFDtkWLceR-Y63ez04CBfvhbDN8RxWV3xXwaNUC0/edit?usp=sharing

The minutes are also copied below for convenience.

LLVM 2020
Code Size BoF Minutes

See also: Aditya Kumar’s talk on “Code Size Compiler Optimizations and Techniques”

Minutes

  *   What use cases for code size are important to the LLVM community?

     *   Deeply embedded devices: IoT, RISC-V

        *   Memory is a large part of the cost of the chip

     *   Gaming consoles

     *   WebAssembly, for faster page loads

     *   Mobile apps

     *   Cache-limited programs on desktops, supercomputers

     *   Real-time systems (both size & speed are important)

        *   Find the code that isn’t on the critical path and optimize it for size

        *   Or find thresholds for heuristics that are good enough for both

     *   Hyperscale systems with lots of processors but only a little memory for each one

        *   EMU architecture<https://www.emutechnology.com/>

        *   Epiphany/Adapteva<https://www.adapteva.com/introduction/> architecture

  *   Why doesn’t -Oz do much better than -O2?

     *   There are optimizations that make dramatic speed improvements, but not as many that make dramatic size improvements

     *   It doesn’t disable things like loop unrolling

     *   It doesn’t disable inlining

     *   It doesn’t enable loop rerolling

     *   -Oz affects heuristics for e.g. inlining, but they can still make the wrong decision

     *   TargetTransformInfo has options for code size, but they often aren’t implemented

     *   If -Oz uses a poorly tuned pass sequence, we can autotune it with OpenTuner etc.

        *   Can get ~5% extra size reduction this way.

     *   LTO is effective for code size; was recently fixed to support -Oz

     *   Do we need different levels like -Os1, -Os2, -Os3?

  *   We have profilers for code speed; can we have something similar for code size?

     *   There is something like this for WebAssembly: Twiggy🌱<https://github.com/rustwasm/twiggy>

     *   Remarks<https://llvm.org/docs/Remarks.html> help understand what optimizations are being missed

  *   What code size benchmarks can we use to measure patches?

     *   Embench.org<https://embench.org/>

     *   CSiBE<http://szeged.github.io/csibe/>

  *   Technique: outlining

     *   How is outlining affected by the size of the instruction set?

        *   Machine outliner might be affected, but IR-level outlining shouldn’t be affected much

     *   How do machine outliner and IR-level outliner compare?

  *   Technique: function merging

     *   Research paper: “Function Merging by Sequence Alignment”

     *   Look into string alignment techniques from biology

  *   Technique: deleting unused code

     *   What’s the best resource?

  *   Technique: compressors like gzip

     *   UPX, LZEXE<https://bellard.org/lzexe.html> save disk space

     *   To save RAM, you need a compression algorithm that supports random access, like something based on Huffman codes

        *   Has been implemented in hardware (IBM CodePack)

  *   Technique: using different optimizations for hot & cold code

     *   For best results, people currently have to move their cold code to a separate file so they can use different flags, which is painful

  *   Could we do code size optimizations in the linker?

     *   Needed to take advantage of RISC-V’s LUI instruction

        *   Requires copy propagation and dead code elimination

  *   Instead of inlining functions into each caller, can we keep one copy of the function and specialize it for all its call sites in the same translation unit?

     *   Attributor already shows improvements when you have recursion and non-trivial but dead code (the level stuff in Olden/bisort/bitonic)

Topics

  1.  Situations where code size matters

  2.  What do other compilers do better?

  3.  Techniques for code size optimization

  4.  How can we improve LLVM?

Techniques

  *   Program design, library design, etc.

  *   General optimizations

     *   Loop idiom recognition (memset, memcpy)

     *   Partial inlining

  *   Deduplication

     *   Outlining

     *   Function merging

  *   Functionality changes

     *   Partial evaluation

     *   Delete unused code

Thanks,
Sean Bartell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201006/b8cbbfff/attachment.html>