[all-commits] [llvm/llvm-project] c9b6e0: [AMDGPU] Graph-based Module Splitting Rewrite (#10...

Thu Aug 29 01:40:19 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c9b6e01b2e4fc930dac91dd44c0592ad7e36d967
      https://github.com/llvm/llvm-project/commit/c9b6e01b2e4fc930dac91dd44c0592ad7e36d967
  Author: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
  Date:   2024-08-29 (Thu, 29 Aug 2024)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
    M llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
    M llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll
    R llvm/test/tools/llvm-split/AMDGPU/debug-name-hiding.ll
    R llvm/test/tools/llvm-split/AMDGPU/debug-non-kernel-root.ll
    M llvm/test/tools/llvm-split/AMDGPU/declarations.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll
    M llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll
    M llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
    M llvm/test/tools/llvm-split/AMDGPU/non-kernels-dependency-indirect.ll
    A llvm/test/tools/llvm-split/AMDGPU/recursive-search-2.ll
    A llvm/test/tools/llvm-split/AMDGPU/recursive-search-8.ll

  Log Message:
  -----------
  [AMDGPU] Graph-based Module Splitting Rewrite (#104763)

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications