[PATCH] D111912: New Pass for Merging Arbitrary Pair of Functions to Reduce Code Size

Fri Oct 15 15:58:08 PDT 2021

rcorcs created this revision.
rcorcs added a reviewer: hiraditya.
rcorcs added a project: LLVM.
Herald added subscribers: ormris, mgrang, mgorny.
rcorcs requested review of this revision.
Herald added a subscriber: llvm-commits.

This is a new pass that reduces code size by merging any pair of similar functions at the IR level.
This implementation include the essential features from the function merging strategies I have been developing for the past few years.

1. The function merging operation:

It provides a utility method for merging any given pair of input functions, except for a few yet unsupported features such as variadic parameters.

For example, given the following pair of functions:
 define i32 @f1(i32 %c, i32 %d) {
entry:

  %add = add nsw i32 %d, %c
  %mul = shl nsw i32 %add, 1
  ret i32 %mul

}

define i32 @f2(i32 %c, i32 %d) {
entry:

  %add = add nsw i32 %d, %c
  %mul = shl nsw i32 %add, 2
  ret i32 %mul

}

We return the following merged function:
 define private i32 @merged(i1 %0, i32 %1, i32 %2) local_unnamed_addr {
entry:

  %3 = add nsw i32 %2, %1
  %4 = select i1 %0, i32 1, i32 2
  %5 = shl nsw i32 %3, %4
  ret i32 %5

}

1.1. Creating the function type of the merged function:
The lists of parameters are merged, combining parameters with equivalent types, and a function identifier is added.
If needed, the return types are combined in a union-like structure.

1.2. Code alignment:

Similar pieces of code are merged where mismatching instructions or operands are properly handled using either conditional branches or value selection based on the function identifier. In order to identify the pieces of equivalent code, we implement a linear pairwise alignment strategy.

Basic blocks are grouped by their size and then paired based on a fingerprint-based similarity metric. A profitability analysis evaluates the paired basic blocks. If deemed profitable, their instructions are aligned following a pairwise strategy, where corresponding instructions are labelled as either matching or mismatching.

The fingerprint of a basic block or function is an integer vector containing the frequency of each LLVM opcode in that piece of code.

1.3: Code generation:

Finally, the merged function is produced from the alignment. Unlike the version discussed in the EuroLLVM'19 presentation, the code generator is simplified for per-block alignments such as the pairwise alignment strategy.

First the alignment is consumed, producing the merged basic blocks and instructions. Mismatching basic blocks are simply copied. Aligned pair of blocks may result in one or many blocks where matching instructions converge to a common block, mismatching instructions are split into two blocks.

Once blocks and instructions have been produced, we assign the label and value operands.

Finally, we need to make sure the dominance property is preserved by running the SSA reconstruction algorithm.

It contains some optimizations for operand reordering, elimination of selections and phi-nodes.

2. The search strategy:

This pass integrates the function merging operation with a search strategy that identifies pairs of functions that are worth merging.
Similar to the basic block pairing, the function pairing also uses the fingerprint-based similarity metric. For each function, we find a function candidate that has the smallest Manhattan distance to the first one. The two functions are merged. If deemed profitable, the input functions are replaced by calls to the merged one. Otherwise, the merged function is deleted.

3. Future work:

This implementation contains what I belive are the most essential features for such a function merging pass. I focused on keeping the code small and providing the faster variants.
For example, the linear pairwise alignment was favored instead of the better quadratic alignment strategy. Similarly, "thunks" are not yet removed since the same effect could be achieved by later running a code-size inlining, global internalizations, etc.

There are also other features that I have either already implemented internally or have plans to do so in the future.

3.1. Use profiling to avoid merging hot basic blocks when optimizing for both performance and code size.
3.2. Develop a hash-based function pairing strategy.
3.3. Provide support for debug information.
3.4. Handle variadic functions. Merge the other parameters and leave the variadic argument for last if there is one.
3.5. Remove the function-identifier parameter if unused.
3.6. Integrate with ThinLTO.

Presentation at the EuroLLVM 2019:
https://www.youtube.com/watch?v=sOCFYfF3iwE

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D111912

Files:
  include/llvm/InitializePasses.h
  include/llvm/Transforms/IPO.h
  include/llvm/Transforms/IPO/FunctionMerging.h
  lib/Passes/PassBuilder.cpp
  lib/Passes/PassRegistry.def
  lib/Transforms/IPO/CMakeLists.txt
  lib/Transforms/IPO/FunctionMerging.cpp
  lib/Transforms/IPO/IPO.cpp
  test/Transforms/FunctionMerging/block-reordering.ll
  test/Transforms/FunctionMerging/operand-selection.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111912.380021.patch
Type: text/x-patch
Size: 84165 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211015/88ede64b/attachment-0001.bin>