[all-commits] [llvm/llvm-project] e6b9fc: [FuncSpec] Global ranking of specialisations

Momchil Velikov via All-commits all-commits at lists.llvm.org
Wed Dec 14 07:35:27 PST 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e6b9fc4c8be00660c5e1f1605a6a5d92475bdba7
      https://github.com/llvm/llvm-project/commit/e6b9fc4c8be00660c5e1f1605a6a5d92475bdba7
  Author: Momchil Velikov <momchil.velikov at arm.com>
  Date:   2022-12-14 (Wed, 14 Dec 2022)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
    M llvm/include/llvm/Transforms/Utils/SCCPSolver.h
    M llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
    A llvm/test/Transforms/FunctionSpecialization/global-rank.ll
    M llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll
    M llvm/test/Transforms/FunctionSpecialization/specialization-order.ll
    M llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

  Log Message:
  -----------
  [FuncSpec] Global ranking of specialisations

The `FunctionSpecialization` pass chooses specializations among the
opportunities presented by a single function and its calls,
progressively penalizing subsequent specialization attempts by
artificially increasing the cost of a specialization, depending on how
many specialization were applied before. Thus the chosen
specializations are sensitive to the order the functions appear in the
module and may be worse than others, had those others been considered
earlier.

This patch makes the `FunctionSpecialization` pass rank the
specializations globally, i.e.  choose the "best" specializations
among the all possible specializations in the module, for all
functions.

Since this involved quite a bit of redesign of the pass data
structures, this patch also carries:

  * removal of duplicate specializations

  * optimization of call sites update, by collecting per
    specialization the list of call sites that can be directly
    rewritten, without prior expensive check if the call constants and
    their positions match those of the specialized function.

A bit of a write-up up about the FuncSpec data structures and
operation:

Each potential function specialisation is kept in a single vector
(`AllSpecs` in `FunctionSpecializer::run`).  This vector is populated
by `FunctionSpecializer::findSpecializations`.

The `findSpecializations` member function has a local `DenseMap` to
eliminate duplicates - with each call to the current function,
`findSpecializations` builds a specialisation signature (`SpecSig`)
and looks it in the duplicates map. If the signature is present, the
function records the call to rewrite into the existing specialisation
instance.  If the signature is absent, it means we have a new
specialisation instance - the function calculates the gain and creates
a new entry in `AllSpecs`. Negative gain specialisation are ignored at
this point, unless forced.

The potential specialisations for a function form a contiguous range
in the `AllSpecs` [1]. This range is recorded in `SpecMap SM`, so we
can quickly find all specialisations for a function.

Once we have all the potential specialisations with their gains we
need to choose the best ones, which fit in the module specialisation
budget. This is done by using a max-heap (`std::make_heap`,
`std::push_heap`, etc) to find the best `NSpec` specialisations with a
single traversal of the `AllSpecs` vector. The heap itself is
contained with a small vector (`BestSpecs`) of indices into
`AllSpecs`, since elements of `AllSpecs` are a bit too heavy to
shuffle around.

Next the chosen specialisation are performed, that is, functions
cloned, `SCCPSolver` primed, and known call sites updated.

Then we run the `SCCPSolver` to propagate constants in the cloned
functions, after which we walk the calls of the original functions to
update them to call the specialised functions.

---

[1] This range may contain specialisation that were discarded and is
not ordered in any way. One alternative design is to keep a vector
indices of all specialisations for this function (which would
initially be, `i`, `i+1`, `i+2`, etc) and later sort them by gain,
pushing non-applied ones to the back. This has the potential to speed
`updateCallSites` up.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D139346

Change-Id: I708851eb38f07c42066637085b833ca91b195998




More information about the All-commits mailing list