[all-commits] [llvm/llvm-project] c68f71: ext-tsp basic block layout

Mon Dec 6 08:58:45 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c68f71eb37c2b6ffcf29e865d443a910e73083bd
      https://github.com/llvm/llvm-project/commit/c68f71eb37c2b6ffcf29e865d443a910e73083bd
  Author: spupyrev <spupyrev at fb.com>
  Date:   2021-12-06 (Mon, 06 Dec 2021)

  Changed paths:
    A llvm/include/llvm/Transforms/Utils/CodeLayout.h
    M llvm/lib/CodeGen/MachineBlockPlacement.cpp
    M llvm/lib/Transforms/Utils/CMakeLists.txt
    A llvm/lib/Transforms/Utils/CodeLayout.cpp
    A llvm/test/CodeGen/X86/code_placement_ext_tsp.ll
    A llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll

  Log Message:
  -----------
  ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424