[llvm] [AMDGPU] Add AMDGPU-specific module splitting (PR #89245)

Mon May 13 07:15:34 PDT 2024

================
@@ -0,0 +1,738 @@
+//===- AMDGPUSplitModule.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file Implements a module splitting algorithm designed to support the
+/// FullLTO --lto-partitions option for parallel codegen. This is completely
+/// different from the common SplitModule pass, as this system is designed with
+/// AMDGPU in mind.
+///
+/// The basic idea of this module splitting implementation is the same as
+/// SplitModule: load-balance the module's functions across a set of N
+/// partitions to allow parallel codegen. However, it does it very
+/// differently than the target-agnostic variant:
+///   - Kernels are used as the module's "roots".
+///     They're known entry points on AMDGPU, and everything else is often
+///     internal only.
----------------
Pierre-vh wrote:

Yes, I initially tried that - running the SplitModule pass w/o externalize but it didn't work in every case.
IIRC, SplitModule gives the same splitting as this in a perfect case (e.g. no dependencies, or all dependencies only have one kernel user), but that's fairly rare and when there's situations like functions being used by >1 kernel, this pass does a better job and gives us a compile time gain somewhere in the middle between full externalize (= no duplication, but no resource usage analysis so doesn't work for us) and conservative splitting (no duplication). 

https://github.com/llvm/llvm-project/pull/89245