[llvm] [offload][SYCL] Add Module splitting by categories. (PR #131347)

Tue Jun 24 09:34:28 PDT 2025

================
@@ -0,0 +1,44 @@
+//===-------- SplitModuleByCategory.h - module split ------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Functionality to split a module by categories.
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORM_UTILS_SPLIT_MODULE_BY_CATEGORY_H
+#define LLVM_TRANSFORM_UTILS_SPLIT_MODULE_BY_CATEGORY_H
+
+#include "llvm/ADT/STLFunctionalExtras.h"
+
+#include <memory>
+#include <optional>
+#include <string>
+
+namespace llvm {
+
+class Module;
+class Function;
+
+/// Splits the given module \p M using the given \p FunctionCategorizer.
+/// \p FunctionCategorizer returns integer category for an input Function.
+/// It may return std::nullopt if a function doesn't have a category.
+/// Module's functions are being grouped by categories. Every such group
+/// populates a call graph containing group's functions themselves and all
+/// reachable functions and globals. Split outputs are populated from each call
+/// graph associated with some category.
----------------
jdoerfert wrote:

Two comments:
1) I am unsure why this talks about a call graph and such. The entire interface works with an opaque "oraql" and categories, why that "oraql" puts functions in the same or different categories might be call graph related, or it might be because their names have the same prefix. We should not conflate one use with this generic capability. EDIT: Coming back to this *after* I read the implementation, I believe I understand what is happening. The interface is conflating two things, which is by itself OK. However, given the naming, this is confusing. I believe the easiest fix is to modify the comment and the name a little. As it is now, one could reasonably assume all functions with category X go into the same module, nothing else. That would have been my preferred way of doing this. The difference is that the dependence logic would then be part of the caller, and this would be a stupid splitting interface. I'm OK with keeping it the other way around for now. The interface doesn't `splitModuleByCategory` though, it's more like `splitModuleTransitiveFromEntryPoints`. And the comment here needs to define "transitive" and what are considered entry points.
2) I am unsure why this uses an optional, and what that means. If `std::nullopt` means it is duplicated into ever module, or something special like that, I can see how that is useful. However, just given the comment here `std::nullopt` is an option and it is not clear what that means.

https://github.com/llvm/llvm-project/pull/131347