[clang] 26eb042 - [Clang] Introduce clang-offload-packager tool to bundle device files
Joseph Huber via cfe-commits
cfe-commits at lists.llvm.org
Wed May 11 06:39:22 PDT 2022
Author: Joseph Huber
Date: 2022-05-11T09:39:13-04:00
New Revision: 26eb04268f4c9c0e2d4ff5bc53ae90cbb7f6c731
URL: https://github.com/llvm/llvm-project/commit/26eb04268f4c9c0e2d4ff5bc53ae90cbb7f6c731
DIFF: https://github.com/llvm/llvm-project/commit/26eb04268f4c9c0e2d4ff5bc53ae90cbb7f6c731.diff
LOG: [Clang] Introduce clang-offload-packager tool to bundle device files
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125165
Added:
clang/docs/ClangOffloadPackager.rst
clang/tools/clang-offload-packager/CMakeLists.txt
clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
Modified:
clang/include/clang/Basic/CodeGenOptions.h
clang/include/clang/Driver/Action.h
clang/include/clang/Driver/ToolChain.h
clang/lib/CodeGen/BackendUtil.cpp
clang/lib/Driver/Action.cpp
clang/lib/Driver/Driver.cpp
clang/lib/Driver/ToolChain.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/Clang.h
clang/test/Driver/amdgpu-openmp-toolchain-new.c
clang/test/Driver/cuda-openmp-driver.cu
clang/test/Driver/cuda-phases.cu
clang/test/Driver/linker-wrapper-image.c
clang/test/Driver/linker-wrapper.c
clang/test/Driver/openmp-offload-gpu-new.c
clang/test/Driver/openmp-offload-infer.c
clang/test/Frontend/embed-object.c
clang/test/Frontend/embed-object.ll
clang/test/lit.cfg.py
clang/tools/CMakeLists.txt
Removed:
################################################################################
diff --git a/clang/docs/ClangOffloadPackager.rst b/clang/docs/ClangOffloadPackager.rst
new file mode 100644
index 0000000000000..5d15cc4c5caf6
--- /dev/null
+++ b/clang/docs/ClangOffloadPackager.rst
@@ -0,0 +1,72 @@
+======================
+Clang Offload Packager
+======================
+
+.. contents::
+ :local:
+
+.. _clang-offload-packager:
+
+Introduction
+============
+
+This tool bundles device files into a single image containing necessary
+metadata. We use a custom binary format for bundling all the device images
+together. The image format is a small header wrapping around a string map. This
+tool creates bundled binaries so that they can be embedded into the host to
+create a fat-binary.
+
+An embedded binary is marked by the ``0x10FF10AD`` magic bytes, followed by a
+version. Each created binary contains its own magic bytes. This allows us to
+locate all the embedded offloading sections even after they may have been merged
+by the linker, such as when using relocatable linking. The format used is
+primarily a binary serialization of the following struct.
+
+.. code-block:: c++
+
+ struct OffloadingImage {
+ uint16_t TheImageKind;
+ uint16_t TheOffloadKind;
+ uint32_t Flags;
+ StringMap<StringRef> StringData;
+ MemoryBufferRef Image;
+ };
+
+Usage
+=====
+
+This tool can be used with the following arguments. Generally information is
+passed as a key-value pair to the ``image=`` argument. The ``file``, ``triple``,
+and ``arch`` arguments are considered mandatory to make a valid image.
+
+.. code-block:: console
+
+ OVERVIEW: A utility for bundling several object files into a single binary.
+ The output binary can then be embedded into the host section table
+ to create a fatbinary containing offloading code.
+
+ USAGE: clang-offload-packager [options]
+
+ OPTIONS:
+
+ Generic Options:
+
+ --help - Display available options (--help-hidden for more)
+ --help-list - Display list of available options (--help-list-hidden for more)
+ --version - Display the version of this program
+
+ clang-offload-packager options:
+
+ --image=<<key>=<value>,...> - List of key and value arguments. Required
+ keywords are 'file' and 'triple'.
+ -o=<file> - Write output to <file>.
+
+Example
+=======
+
+This tool simply takes many input files from the ``image`` option and creates a
+single output file with all the images combined.
+
+.. code-block:: console
+
+ clang-offload-packager -o out.bin --image=file=input.o,triple=nvptx64,arch=sm_70
diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h
index 48a41244afa32..23d76c308d847 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -276,12 +276,8 @@ class CodeGenOptions : public CodeGenOptionsBase {
/// CUDA runtime back-end for incorporating them into host-side object file.
std::string CudaGpuBinaryFileName;
- /// List of filenames and metadata passed in using the -fembed-offload-object
- /// option to embed device-side offloading objects into the host as a named
- /// section. Input passed in as 'filename,kind,triple,arch'.
- ///
- /// NOTE: This will need to be expanded whenever we want to pass in more
- /// metadata, at some point this should be its own clang tool.
+ /// List of filenames passed in using the -fembed-offload-object option. These
+ /// are offloading binaries containing device images and metadata.
std::vector<std::string> OffloadObjects;
/// The name of the file to which the backend should save YAML optimization
diff --git a/clang/include/clang/Driver/Action.h b/clang/include/clang/Driver/Action.h
index f8a96619c4f61..684ccd358275c 100644
--- a/clang/include/clang/Driver/Action.h
+++ b/clang/include/clang/Driver/Action.h
@@ -74,6 +74,7 @@ class Action {
OffloadBundlingJobClass,
OffloadUnbundlingJobClass,
OffloadWrapperJobClass,
+ OffloadPackagerJobClass,
LinkerWrapperJobClass,
StaticLibJobClass,
@@ -669,6 +670,17 @@ class OffloadWrapperJobAction : public JobAction {
}
};
+class OffloadPackagerJobAction : public JobAction {
+ void anchor() override;
+
+public:
+ OffloadPackagerJobAction(ActionList &Inputs, types::ID Type);
+
+ static bool classof(const Action *A) {
+ return A->getKind() == OffloadPackagerJobClass;
+ }
+};
+
class LinkerWrapperJobAction : public JobAction {
void anchor() override;
diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h
index 5d95a644fde1a..269b24dc83c39 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -151,6 +151,7 @@ class ToolChain {
mutable std::unique_ptr<Tool> IfsMerge;
mutable std::unique_ptr<Tool> OffloadBundler;
mutable std::unique_ptr<Tool> OffloadWrapper;
+ mutable std::unique_ptr<Tool> OffloadPackager;
mutable std::unique_ptr<Tool> LinkerWrapper;
Tool *getClang() const;
@@ -162,6 +163,7 @@ class ToolChain {
Tool *getClangAs() const;
Tool *getOffloadBundler() const;
Tool *getOffloadWrapper() const;
+ Tool *getOffloadPackager() const;
Tool *getLinkerWrapper() const;
mutable bool SanitizerArgsChecked = false;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index c802f74bb2db1..94808e8a625e6 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -1211,33 +1211,16 @@ void clang::EmbedObject(llvm::Module *M, const CodeGenOptions &CGOpts,
return;
for (StringRef OffloadObject : CGOpts.OffloadObjects) {
- SmallVector<StringRef, 4> ObjectFields;
- OffloadObject.split(ObjectFields, ',');
-
- if (ObjectFields.size() != 4) {
- auto DiagID = Diags.getCustomDiagID(
- DiagnosticsEngine::Error, "Expected at least four arguments '%0'");
- Diags.Report(DiagID) << OffloadObject;
- return;
- }
-
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> ObjectOrErr =
- llvm::MemoryBuffer::getFileOrSTDIN(ObjectFields[0]);
+ llvm::MemoryBuffer::getFileOrSTDIN(OffloadObject);
if (std::error_code EC = ObjectOrErr.getError()) {
auto DiagID = Diags.getCustomDiagID(DiagnosticsEngine::Error,
"could not open '%0' for embedding");
- Diags.Report(DiagID) << ObjectFields[0];
+ Diags.Report(DiagID) << OffloadObject;
return;
}
- OffloadBinary::OffloadingImage Image{};
- Image.TheImageKind = getImageKind(ObjectFields[0].rsplit(".").second);
- Image.TheOffloadKind = getOffloadKind(ObjectFields[1]);
- Image.StringData = {{"triple", ObjectFields[2]}, {"arch", ObjectFields[3]}};
- Image.Image = **ObjectOrErr;
-
- std::unique_ptr<MemoryBuffer> OffloadBuffer = OffloadBinary::write(Image);
- llvm::embedBufferInModule(*M, *OffloadBuffer, ".llvm.offloading",
+ llvm::embedBufferInModule(*M, **ObjectOrErr, ".llvm.offloading",
Align(OffloadBinary::getAlignment()));
}
}
diff --git a/clang/lib/Driver/Action.cpp b/clang/lib/Driver/Action.cpp
index 8f4403ff42b65..ead3a23da418c 100644
--- a/clang/lib/Driver/Action.cpp
+++ b/clang/lib/Driver/Action.cpp
@@ -45,6 +45,8 @@ const char *Action::getClassName(ActionClass AC) {
return "clang-offload-unbundler";
case OffloadWrapperJobClass:
return "clang-offload-wrapper";
+ case OffloadPackagerJobClass:
+ return "clang-offload-packager";
case LinkerWrapperJobClass:
return "clang-linker-wrapper";
case StaticLibJobClass:
@@ -432,6 +434,12 @@ OffloadWrapperJobAction::OffloadWrapperJobAction(ActionList &Inputs,
types::ID Type)
: JobAction(OffloadWrapperJobClass, Inputs, Type) {}
+void OffloadPackagerJobAction::anchor() {}
+
+OffloadPackagerJobAction::OffloadPackagerJobAction(ActionList &Inputs,
+ types::ID Type)
+ : JobAction(OffloadPackagerJobClass, Inputs, Type) {}
+
void LinkerWrapperJobAction::anchor() {}
LinkerWrapperJobAction::LinkerWrapperJobAction(ActionList &Inputs,
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 86ef67d2de3ec..e09070016001d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4382,6 +4382,7 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
getFinalPhase(Args) == phases::Preprocess))
return HostAction;
+ ActionList OffloadActions;
OffloadAction::DeviceDependences DDeps;
const Action::OffloadKind OffloadKinds[] = {
@@ -4457,6 +4458,9 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
auto TCAndArch = TCAndArchs.begin();
for (Action *A : DeviceActions) {
DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+ OffloadAction::DeviceDependences DDep;
+ DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+ OffloadActions.push_back(C.MakeAction<OffloadAction>(DDep, A->getType()));
++TCAndArch;
}
}
@@ -4464,10 +4468,16 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
if (DeviceOnly)
return C.MakeAction<OffloadAction>(DDeps, types::TY_Nothing);
+ Action *OffloadPackager =
+ C.MakeAction<OffloadPackagerJobAction>(OffloadActions, types::TY_Image);
+ OffloadAction::DeviceDependences DDep;
+ DDep.add(*OffloadPackager, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
+ nullptr, Action::OFK_None);
OffloadAction::HostDependence HDep(
*HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
- /*BoundArch=*/nullptr, DDeps);
- return C.MakeAction<OffloadAction>(HDep, DDeps);
+ /*BoundArch=*/nullptr, isa<CompileJobAction>(HostAction) ? DDep : DDeps);
+ return C.MakeAction<OffloadAction>(
+ HDep, isa<CompileJobAction>(HostAction) ? DDep : DDeps);
}
Action *Driver::ConstructPhaseAction(
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index d45a90048dcbe..d7d72153c78c0 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -328,6 +328,12 @@ Tool *ToolChain::getOffloadWrapper() const {
return OffloadWrapper.get();
}
+Tool *ToolChain::getOffloadPackager() const {
+ if (!OffloadPackager)
+ OffloadPackager.reset(new tools::OffloadPackager(*this));
+ return OffloadPackager.get();
+}
+
Tool *ToolChain::getLinkerWrapper() const {
if (!LinkerWrapper)
LinkerWrapper.reset(new tools::LinkerWrapper(*this, getLink()));
@@ -373,6 +379,8 @@ Tool *ToolChain::getTool(Action::ActionClass AC) const {
case Action::OffloadWrapperJobClass:
return getOffloadWrapper();
+ case Action::OffloadPackagerJobClass:
+ return getOffloadPackager();
case Action::LinkerWrapperJobClass:
return getLinkerWrapper();
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index bea6ebda2ee78..a4777b55072ba 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6985,22 +6985,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// Host-side offloading recieves the device object files and embeds it in a
// named section including the associated target triple and architecture.
- for (const InputInfo Input : HostOffloadingInputs) {
- const Action *OffloadAction = Input.getAction();
- const ToolChain *TC = OffloadAction->getOffloadingToolChain();
- const ArgList &TCArgs =
- C.getArgsForToolChain(TC, OffloadAction->getOffloadingArch(),
- OffloadAction->getOffloadingDeviceKind());
- StringRef File = C.getArgs().MakeArgString(TC->getInputFilename(Input));
- StringRef Arch = (OffloadAction->getOffloadingArch())
- ? OffloadAction->getOffloadingArch()
- : TCArgs.getLastArgValue(options::OPT_march_EQ);
-
- CmdArgs.push_back(Args.MakeArgString(
- "-fembed-offload-object=" + File + "," +
- Action::GetOffloadKindName(OffloadAction->getOffloadingDeviceKind()) +
- "," + TC->getTripleString() + "," + Arch));
- }
+ for (const InputInfo Input : HostOffloadingInputs)
+ CmdArgs.push_back(Args.MakeArgString("-fembed-offload-object=" +
+ TC.getInputFilename(Input)));
if (Triple.isAMDGPU()) {
handleAMDGPUCodeObjectVersionOptions(D, Args, CmdArgs);
@@ -8237,6 +8224,42 @@ void OffloadWrapper::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs, Inputs, Output));
}
+void OffloadPackager::ConstructJob(Compilation &C, const JobAction &JA,
+ const InputInfo &Output,
+ const InputInfoList &Inputs,
+ const llvm::opt::ArgList &Args,
+ const char *LinkingOutput) const {
+ ArgStringList CmdArgs;
+
+ // Add the output file name.
+ assert(Output.isFilename() && "Invalid output.");
+ CmdArgs.push_back("-o");
+ CmdArgs.push_back(Output.getFilename());
+
+ // Create the inputs to bundle the needed metadata.
+ for (const InputInfo &Input : Inputs) {
+ const Action *OffloadAction = Input.getAction();
+ const ToolChain *TC = OffloadAction->getOffloadingToolChain();
+ const ArgList &TCArgs =
+ C.getArgsForToolChain(TC, OffloadAction->getOffloadingArch(),
+ OffloadAction->getOffloadingDeviceKind());
+ StringRef File = C.getArgs().MakeArgString(TC->getInputFilename(Input));
+ StringRef Arch = (OffloadAction->getOffloadingArch())
+ ? OffloadAction->getOffloadingArch()
+ : TCArgs.getLastArgValue(options::OPT_march_EQ);
+
+ CmdArgs.push_back(Args.MakeArgString(
+ "--image=file=" + File + "," + "triple=" + TC->getTripleString() + "," +
+ "arch=" + Arch + "," + "kind=" +
+ Action::GetOffloadKindName(OffloadAction->getOffloadingDeviceKind())));
+ }
+
+ C.addCommand(std::make_unique<Command>(
+ JA, *this, ResponseFileSupport::None(),
+ Args.MakeArgString(getToolChain().GetProgramPath(getShortName())),
+ CmdArgs, Inputs, Output));
+}
+
void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,
const InputInfoList &Inputs,
diff --git a/clang/lib/Driver/ToolChains/Clang.h b/clang/lib/Driver/ToolChains/Clang.h
index 79407c9884d51..37263efd57a5e 100644
--- a/clang/lib/Driver/ToolChains/Clang.h
+++ b/clang/lib/Driver/ToolChains/Clang.h
@@ -170,6 +170,19 @@ class LLVM_LIBRARY_VISIBILITY OffloadWrapper final : public Tool {
const char *LinkingOutput) const override;
};
+/// Offload binary tool.
+class LLVM_LIBRARY_VISIBILITY OffloadPackager final : public Tool {
+public:
+ OffloadPackager(const ToolChain &TC)
+ : Tool("Offload::Packager", "clang-offload-packager", TC) {}
+
+ bool hasIntegratedCPP() const override { return false; }
+ void ConstructJob(Compilation &C, const JobAction &JA,
+ const InputInfo &Output, const InputInfoList &Inputs,
+ const llvm::opt::ArgList &TCArgs,
+ const char *LinkingOutput) const override;
+};
+
/// Linker wrapper tool.
class LLVM_LIBRARY_VISIBILITY LinkerWrapper final : public Tool {
const Tool *Linker;
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain-new.c b/clang/test/Driver/amdgpu-openmp-toolchain-new.c
index bfb681084f274..2ddc5a12c2001 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain-new.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain-new.c
@@ -15,7 +15,7 @@
// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-PHASES %s
-// CHECK-PHASES: 0: input, "[[INPUT:.*]]", c, (host-openmp)
+// CHECK-PHASES: 0: input, "[[INPUT:.+]]", c, (host-openmp)
// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
// CHECK-PHASES: 3: input, "[[INPUT]]", c, (device-openmp)
@@ -24,10 +24,12 @@
// CHECK-PHASES: 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {5}, ir
// CHECK-PHASES: 7: backend, {6}, assembler, (device-openmp)
// CHECK-PHASES: 8: assembler, {7}, object, (device-openmp)
-// CHECK-PHASES: 9: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {8}, ir
-// CHECK-PHASES: 10: backend, {9}, assembler, (host-openmp)
-// CHECK-PHASES: 11: assembler, {10}, object, (host-openmp)
-// CHECK-PHASES: 12: clang-linker-wrapper, {11}, image, (host-openmp)
+// CHECK-PHASES: 9: offload, "device-openmp (amdgcn-amd-amdhsa)" {8}, object
+// CHECK-PHASES: 10: clang-offload-packager, {9}, image
+// CHECK-PHASES: 11: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, " (x86_64-unknown-linux-gnu)" {10}, ir
+// CHECK-PHASES: 12: backend, {11}, assembler, (host-openmp)
+// CHECK-PHASES: 13: assembler, {12}, object, (host-openmp)
+// CHECK-PHASES: 14: clang-linker-wrapper, {13}, image, (host-openmp)
// handling of --libomptarget-amdgpu-bc-path
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc %s 2>&1 | FileCheck %s --check-prefix=CHECK-LIBOMPTARGET
@@ -38,9 +40,10 @@
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-BINDINGS
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx803 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-BINDINGS
-// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.*]]"], output: "[[HOST_BC:.*]]"
-// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"
-// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_BC]]"], output: "[[HOST_OBJ:.*]]"
+// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
+// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.+]]"
+// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_BC]]"], output: "[[BINARY:.+]]"
+// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
diff --git a/clang/test/Driver/cuda-openmp-driver.cu b/clang/test/Driver/cuda-openmp-driver.cu
index a78696177bab8..e10bc72a252aa 100644
--- a/clang/test/Driver/cuda-openmp-driver.cu
+++ b/clang/test/Driver/cuda-openmp-driver.cu
@@ -11,7 +11,8 @@
// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]"], output: "[[PTX_SM_70:.+]]"
// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_SM_70:.+]]"], output: "[[CUBIN_SM_70:.+]]"
// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN_SM_70]]", "[[PTX_SM_70:.+]]"], output: "[[FATBIN_SM_70:.+]]"
-// BINDINGS-NEXT: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT]]", "[[FATBIN_SM_35]]", "[[FATBIN_SM_70]]"], output: "[[HOST_OBJ:.+]]"
+// BINDINGS-NEXT: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[FATBIN_SM_35]]", "[[FATBIN_SM_70]]"], output: "[[BINARY:.+]]"
+// BINDINGS-NEXT: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
// BINDINGS-NEXT: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### -nocudalib --offload-new-driver %s 2>&1 | FileCheck -check-prefix RDC %s
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index a4c315173d89f..dac63942ddb49 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -223,9 +223,9 @@
//
// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases --offload-new-driver \
// RUN: --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck --check-prefix=NEW_DRIVER %s
-// NEW_DRIVER: 0: input, "[[INPUT:.*]]", cuda, (host-cuda)
-// NEW_DRIVER: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
-// NEW_DRIVER: 2: compiler, {1}, ir, (host-cuda)
+// NEW_DRIVER: 0: input, "[[INPUT:.+]]", cuda
+// NEW_DRIVER: 1: preprocessor, {0}, cuda-cpp-output
+// NEW_DRIVER: 2: compiler, {1}, ir
// NEW_DRIVER: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
// NEW_DRIVER: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
// NEW_DRIVER: 5: compiler, {4}, ir, (device-cuda, sm_52)
@@ -234,15 +234,18 @@
// NEW_DRIVER: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, object
// NEW_DRIVER: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, assembler
// NEW_DRIVER: 10: linker, {8, 9}, cuda-fatbin, (device-cuda, sm_52)
-// NEW_DRIVER: 11: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
-// NEW_DRIVER: 12: preprocessor, {11}, cuda-cpp-output, (device-cuda, sm_70)
-// NEW_DRIVER: 13: compiler, {12}, ir, (device-cuda, sm_70)
-// NEW_DRIVER: 14: backend, {13}, assembler, (device-cuda, sm_70)
-// NEW_DRIVER: 15: assembler, {14}, object, (device-cuda, sm_70)
-// NEW_DRIVER: 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {15}, object
-// NEW_DRIVER: 17: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {14}, assembler
-// NEW_DRIVER: 18: linker, {16, 17}, cuda-fatbin, (device-cuda, sm_70)
-// NEW_DRIVER: 19: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {10}, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {18}, ir
-// NEW_DRIVER: 20: backend, {19}, assembler, (host-cuda)
-// NEW_DRIVER: 21: assembler, {20}, object, (host-cuda)
-// NEW_DRIVER: 22: clang-linker-wrapper, {21}, image, (host-cuda)
+// NEW_DRIVER: 11: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {10}, cuda-fatbin
+// NEW_DRIVER: 12: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW_DRIVER: 13: preprocessor, {12}, cuda-cpp-output, (device-cuda, sm_70)
+// NEW_DRIVER: 14: compiler, {13}, ir, (device-cuda, sm_70)
+// NEW_DRIVER: 15: backend, {14}, assembler, (device-cuda, sm_70)
+// NEW_DRIVER: 16: assembler, {15}, object, (device-cuda, sm_70)
+// NEW_DRIVER: 17: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {16}, object
+// NEW_DRIVER: 18: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {15}, assembler
+// NEW_DRIVER: 19: linker, {17, 18}, cuda-fatbin, (device-cuda, sm_70)
+// NEW_DRIVER: 20: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {19}, cuda-fatbin
+// NEW_DRIVER: 21: clang-offload-packager, {11, 20}, image
+// NEW_DRIVER: 22: offload, " (powerpc64le-ibm-linux-gnu)" {2}, " (powerpc64le-ibm-linux-gnu)" {21}, ir
+// NEW_DRIVER: 23: backend, {22}, assembler, (host-cuda)
+// NEW_DRIVER: 24: assembler, {23}, object, (host-cuda)
+// NEW_DRIVER: 25: clang-linker-wrapper, {24}, image, (host-cuda)
diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c
index 6badd97e2eb16..98141f4658bd1 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -2,8 +2,9 @@
// REQUIRES: nvptx-registered-target
// REQUIRES: amdgpu-registered-target
+// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70
+// RUN: -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple x86_64-unknown-linux-gnu \
// RUN: -linker-path /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OPENMP
@@ -28,8 +29,9 @@
// OPENMP-NEXT: ret void
// OPENMP-NEXT: }
+// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,cuda,nvptx64-nvida-cuda,sm_70
+// RUN: -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple x86_64-unknown-linux-gnu \
// RUN: -linker-path /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 3e126f0e7484d..62bbd62fd6c29 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,25 +2,28 @@
// REQUIRES: nvptx-registered-target
// REQUIRES: amdgpu-registered-target
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=NVPTX_LINK
// NVPTX_LINK: nvlink{{.*}}-m64 -o {{.*}}.out -arch sm_70 {{.*}}.o {{.*}}.o
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,amdgcn-amd-amdhsa,gfx908 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,amdgcn-amd-amdhsa,gfx908
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU_LINK
// AMDGPU_LINK: lld{{.*}}-flavor gnu --no-undefined -shared -o {{.*}}.out {{.*}}.o {{.*}}.o
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,x86_64-unknown-linux-gnu, \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,x86_64-unknown-linux-gnu,
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld.lld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CPU_LINK
@@ -32,42 +35,48 @@
// HOST_LINK: ld.lld{{.*}}-a -b -c {{.*}}.o -o a.out
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-bc.bc,openmp,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-bc.bc,openmp,nvptx64-nvida-cuda,sm_70
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-bc.bc,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-bc.bc,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
// LTO: ptxas{{.*}}-m64 -o {{.*}}.cubin -O2 --gpu-name sm_70 {{.*}}.s
// LTO-NOT: nvlink
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,cuda,nvptx64-nvida-cuda,sm_70
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA_OMP_LINK
// CUDA_OMP_LINK: nvlink{{.*}}-m64 -o {{.*}}.out -arch sm_70 {{.*}}.o {{.*}}.o
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-lib.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_52
-// RUN: llvm-ar rcs %t.a %t-lib.o
-// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-obj.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70
+// RUN: clang-offload-packager -o %t-lib.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_52
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t-lib.out
+// RUN: llvm-ar rcs %t.a %t.o
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-obj.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple x86_64-unknown-linux-gnu --dry-run -linker-path \
// RUN: /usr/bin/ld -- %t.a %t-obj.o -o a.out 2>&1 | FileCheck %s --check-prefix=STATIC-LIBRARY
// STATIC-LIBRARY: nvlink{{.*}} -arch sm_70
// STATIC-LIBRARY-NOT: nvlink{{.*}} -arch sm_50
+// RUN: clang-offload-packager -o %t.out \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_52
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,cuda,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,openmp,nvptx64-nvida-cuda,sm_70 \
-// RUN: -fembed-offload-object=%S/Inputs/dummy-elf.o,cuda,nvptx64-nvida-cuda,sm_52
+// RUN: -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --dry-run --host-triple x86_64-unknown-linux-gnu -linker-path \
// RUN: /usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA
-// CUDA: nvlink{{.*}}-m64 -o {{.*}}.out -arch sm_70 {{.*}}.o {{.*}}.o
// CUDA: nvlink{{.*}}-m64 -o {{.*}}.out -arch sm_52 {{.*}}.o
-// CUDA: fatbinary{{.*}}-64 --create {{.*}}.fatbin --image=profile=sm_70,file={{.*}}.out --image=profile=sm_52,file={{.*}}.out
+// CUDA: nvlink{{.*}}-m64 -o {{.*}}.out -arch sm_70 {{.*}}.o {{.*}}.o
+// CUDA: fatbinary{{.*}}-64 --create {{.*}}.fatbin --image=profile=sm_52,file={{.*}}.out --image=profile=sm_70,file={{.*}}.out
diff --git a/clang/test/Driver/openmp-offload-gpu-new.c b/clang/test/Driver/openmp-offload-gpu-new.c
index 782f00830987f..3e7bc4fb11ee9 100644
--- a/clang/test/Driver/openmp-offload-gpu-new.c
+++ b/clang/test/Driver/openmp-offload-gpu-new.c
@@ -23,7 +23,7 @@
// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-PHASES %s
-// CHECK-PHASES: 0: input, "[[INPUT:.*]]", c, (host-openmp)
+// CHECK-PHASES: 0: input, "[[INPUT:.+]]", c, (host-openmp)
// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
// CHECK-PHASES: 3: input, "[[INPUT]]", c, (device-openmp)
@@ -32,16 +32,19 @@
// CHECK-PHASES: 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64-nvidia-cuda)" {5}, ir
// CHECK-PHASES: 7: backend, {6}, assembler, (device-openmp)
// CHECK-PHASES: 8: assembler, {7}, object, (device-openmp)
-// CHECK-PHASES: 9: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64-nvidia-cuda)" {8}, ir
-// CHECK-PHASES: 10: backend, {9}, assembler, (host-openmp)
-// CHECK-PHASES: 11: assembler, {10}, object, (host-openmp)
-// CHECK-PHASES: 12: clang-linker-wrapper, {11}, image, (host-openmp)
+// CHECK-PHASES: 9: offload, "device-openmp (nvptx64-nvidia-cuda)" {8}, object
+// CHECK-PHASES: 10: clang-offload-packager, {9}, image
+// CHECK-PHASES: 11: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, " (x86_64-unknown-linux-gnu)" {10}, ir
+// CHECK-PHASES: 12: backend, {11}, assembler, (host-openmp)
+// CHECK-PHASES: 13: assembler, {12}, object, (host-openmp)
+// CHECK-PHASES: 14: clang-linker-wrapper, {13}, image, (host-openmp)
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-BINDINGS
// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.*]]"], output: "[[HOST_BC:.*]]"
// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"
// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC]]"], output: "[[DEVICE_OBJ:.*]]"
-// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ]]"], output: "[[HOST_OBJ:.*]]"
+// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_OBJ]]"], output: "[[BINARY:.*]]"
+// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.*]]"
// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda --offload-arch=sm_52 --offload-arch=sm_70 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-ARCH-BINDINGS
@@ -50,7 +53,8 @@
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_52]]"], output: "[[DEVICE_OBJ_SM_52:.*]]"
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC_SM_70:.*]]"
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_70]]"], output: "[[DEVICE_OBJ_SM_70:.*]]"
-// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ_SM_52]]", "[[DEVICE_OBJ_SM_70]]"], output: "[[HOST_OBJ:.*]]"
+// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_OBJ_SM_52]]", "[[DEVICE_OBJ_SM_70]]"], output: "[[BINARY:.*]]"
+// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.*]]"
// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
@@ -62,7 +66,8 @@
// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[NVIDIA_PTX:.+]]"
// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[NVIDIA_PTX]]"], output: "[[NVIDIA_CUBIN:.+]]"
// CHECK-NVIDIA-AMDGPU: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[AMD_BC:.+]]"
-// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[NVIDIA_CUBIN]]", "[[AMD_BC]]"], output: "[[HOST_OBJ:.+]]"
+// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[NVIDIA_CUBIN]]", "[[AMD_BC]]"], output: "[[BINARY:.*]]"
+// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
@@ -73,7 +78,7 @@
// RUN: -nogpulib %s -o openmp-offload-gpu 2>&1 \
// RUN: | FileCheck -check-prefix=DRIVER_EMBEDDING %s
-// DRIVER_EMBEDDING: -fembed-offload-object=[[CUBIN:.*\.cubin]],openmp,nvptx64-nvidia-cuda,sm_70
+// DRIVER_EMBEDDING: -fembed-offload-object={{.*}}.out
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda \
// RUN: --offload-host-only -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HOST-ONLY
diff --git a/clang/test/Driver/openmp-offload-infer.c b/clang/test/Driver/openmp-offload-infer.c
index d79021e0fd69f..7b4eed336d5a2 100644
--- a/clang/test/Driver/openmp-offload-infer.c
+++ b/clang/test/Driver/openmp-offload-infer.c
@@ -23,7 +23,8 @@
// CHECK-NVIDIA-AMDGPU: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[AMD_BC:.+]]"
// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[NVIDIA_PTX:.+]]"
// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[NVIDIA_PTX]]"], output: "[[NVIDIA_CUBIN:.+]]"
-// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[AMD_BC]]", "[[NVIDIA_CUBIN]]"], output: "[[HOST_OBJ:.+]]"
+// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[AMD_BC]]", "[[NVIDIA_CUBIN]]"], output: "[[BINARY:.+]]"
+// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
@@ -34,7 +35,8 @@
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_52]]"], output: "[[DEVICE_OBJ_SM_52:.*]]"
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC_SM_70:.*]]"
// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_70]]"], output: "[[DEVICE_OBJ_SM_70:.*]]"
-// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ_SM_52]]", "[[DEVICE_OBJ_SM_70]]"], output: "[[HOST_OBJ:.*]]"
+// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_OBJ_SM_52]]", "[[DEVICE_OBJ_SM_70]]"], output: "[[BINARY:.+]]"
+// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.*]]"
// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
diff --git a/clang/test/Frontend/embed-object.c b/clang/test/Frontend/embed-object.c
index 47198755e44a4..ceb3378a60ab4 100644
--- a/clang/test/Frontend/embed-object.c
+++ b/clang/test/Frontend/embed-object.c
@@ -1,7 +1,6 @@
-// RUN: %clang_cc1 -x c -triple x86_64-unknown-linux-gnu -emit-llvm -fembed-offload-object=%S/Inputs/empty.h,,, -o - %s | FileCheck %s
+// RUN: %clang_cc1 -x c -triple x86_64-unknown-linux-gnu -emit-llvm -fembed-offload-object=%S/Inputs/empty.h -o - %s | FileCheck %s
-// CHECK: @[[OBJECT:.+]] = private constant [120 x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 8
+// CHECK: @[[OBJECT:.+]] = private constant [0 x i8] zeroinitializer, section ".llvm.offloading", align 8
// CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr @[[OBJECT]]], section "llvm.metadata"
-
void foo(void) {}
diff --git a/clang/test/Frontend/embed-object.ll b/clang/test/Frontend/embed-object.ll
index 10b886fd025a2..177ecdf7f8a32 100644
--- a/clang/test/Frontend/embed-object.ll
+++ b/clang/test/Frontend/embed-object.ll
@@ -1,10 +1,10 @@
; RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
-; RUN: -fembed-offload-object=%S/Inputs/empty.h,,, \
-; RUN: -fembed-offload-object=%S/Inputs/empty.h,,, -x ir %s -o - \
+; RUN: -fembed-offload-object=%S/Inputs/empty.h \
+; RUN: -fembed-offload-object=%S/Inputs/empty.h -x ir %s -o - \
; RUN: | FileCheck %s -check-prefix=CHECK
-; CHECK: @[[OBJECT_1:.+]] = private constant [120 x i8] c"\10\FF\10\AD{{.*}}\00", section ".llvm.offloading", align 8
-; CHECK: @[[OBJECT_2:.+]] = private constant [120 x i8] c"\10\FF\10\AD{{.*}}\00", section ".llvm.offloading", align 8
+; CHECK: @[[OBJECT_1:.+]] = private constant [0 x i8] zeroinitializer, section ".llvm.offloading", align 8
+; CHECK: @[[OBJECT_2:.+]] = private constant [0 x i8] zeroinitializer, section ".llvm.offloading", align 8
; CHECK: @llvm.compiler.used = appending global [3 x ptr] [ptr @x, ptr @[[OBJECT_1]], ptr @[[OBJECT_2]]], section "llvm.metadata"
@x = private constant i8 1
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index b7557952b935c..44f5ed1d63525 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -63,7 +63,7 @@
tool_dirs = [config.clang_tools_dir, config.llvm_tools_dir]
tools = [
- 'apinotes-test', 'c-index-test', 'clang-
diff ', 'clang-format', 'clang-repl',
+ 'apinotes-test', 'c-index-test', 'clang-
diff ', 'clang-format', 'clang-repl', 'clang-offload-packager',
'clang-tblgen', 'clang-scan-deps', 'opt', 'llvm-ifs', 'yaml2obj', 'clang-linker-wrapper',
ToolSubst('%clang_extdef_map', command=FindTool(
'clang-extdef-mapping'), unresolved='ignore'),
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index b071a776b32ae..90ef041649795 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -10,6 +10,7 @@ add_clang_subdirectory(clang-fuzzer)
add_clang_subdirectory(clang-import-test)
add_clang_subdirectory(clang-nvlink-wrapper)
add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-offload-packager)
add_clang_subdirectory(clang-offload-bundler)
add_clang_subdirectory(clang-offload-wrapper)
add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-offload-packager/CMakeLists.txt b/clang/tools/clang-offload-packager/CMakeLists.txt
new file mode 100644
index 0000000000000..bf95f5d5f8dd9
--- /dev/null
+++ b/clang/tools/clang-offload-packager/CMakeLists.txt
@@ -0,0 +1,28 @@
+set(LLVM_LINK_COMPONENTS
+ ${LLVM_TARGETS_TO_BUILD}
+ Object
+ Support)
+
+if(NOT CLANG_BUILT_STANDALONE)
+ set(tablegen_deps intrinsics_gen)
+endif()
+
+add_clang_executable(clang-offload-packager
+ ClangOffloadPackager.cpp
+
+ DEPENDS
+ ${tablegen_deps}
+ )
+
+set(CLANG_LINKER_WRAPPER_LIB_DEPS
+ clangBasic
+ )
+
+add_dependencies(clang clang-offload-packager)
+
+target_link_libraries(clang-offload-packager
+ PRIVATE
+ ${CLANG_LINKER_WRAPPER_LIB_DEPS}
+ )
+
+install(TARGETS clang-offload-packager RUNTIME DESTINATION bin)
diff --git a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
new file mode 100644
index 0000000000000..a9b84669ab671
--- /dev/null
+++ b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
@@ -0,0 +1,114 @@
+//===-- clang-offload-packager/ClangOffloadPackager.cpp - file bundler ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This tool takes several device object files and bundles them into a single
+// binary image using a custom binary format. This is intended to be used to
+// embed many device files into an application to create a fat binary.
+//
+//===---------------------------------------------------------------------===//
+
+#include "clang/Basic/Version.h"
+
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileOutputBuffer.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/WithColor.h"
+
+using namespace llvm;
+using namespace llvm::object;
+
+static cl::opt<bool> Help("h", cl::desc("Alias for -help"), cl::Hidden);
+
+static cl::OptionCategory
+ ClangOffloadPackagerCategory("clang-offload-packager options");
+
+static cl::opt<std::string> OutputFile("o", cl::Required,
+ cl::desc("Write output to <file>."),
+ cl::value_desc("file"),
+ cl::cat(ClangOffloadPackagerCategory));
+
+static cl::list<std::string>
+ DeviceImages("image", cl::ZeroOrMore,
+ cl::desc("List of key and value arguments. Required keywords "
+ "are 'file' and 'triple'."),
+ cl::value_desc("<key>=<value>,..."),
+ cl::cat(ClangOffloadPackagerCategory));
+
+static void PrintVersion(raw_ostream &OS) {
+ OS << clang::getClangToolFullVersion("clang-offload-packager") << '\n';
+}
+
+int main(int argc, const char **argv) {
+ sys::PrintStackTraceOnErrorSignal(argv[0]);
+ cl::HideUnrelatedOptions(ClangOffloadPackagerCategory);
+ cl::SetVersionPrinter(PrintVersion);
+ cl::ParseCommandLineOptions(
+ argc, argv,
+ "A utility for bundling several object files into a single binary.\n"
+ "The output binary can then be embedded into the host section table\n"
+ "to create a fatbinary containing offloading code.\n");
+
+ if (Help) {
+ cl::PrintHelpMessage();
+ return EXIT_SUCCESS;
+ }
+
+ auto reportError = [argv](Error E) {
+ logAllUnhandledErrors(std::move(E), WithColor::error(errs(), argv[0]));
+ return EXIT_FAILURE;
+ };
+
+ SmallVector<char, 1024> BinaryData;
+ raw_svector_ostream OS(BinaryData);
+ for (StringRef Image : DeviceImages) {
+ StringMap<StringRef> Args;
+ for (StringRef Arg : llvm::split(Image, ","))
+ Args.insert(Arg.split("="));
+
+ if (!Args.count("triple") || !Args.count("file"))
+ return reportError(createStringError(
+ inconvertibleErrorCode(),
+ "'file' and 'triple' are required image arguments"));
+
+ OffloadBinary::OffloadingImage ImageBinary{};
+ std::unique_ptr<llvm::MemoryBuffer> DeviceImage;
+ for (const auto &KeyAndValue : Args) {
+ StringRef Key = KeyAndValue.getKey();
+ if (Key == "file") {
+ llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> ObjectOrErr =
+ llvm::MemoryBuffer::getFileOrSTDIN(KeyAndValue.getValue());
+ if (std::error_code EC = ObjectOrErr.getError())
+ return reportError(errorCodeToError(EC));
+ DeviceImage = std::move(*ObjectOrErr);
+ ImageBinary.Image = *DeviceImage;
+ ImageBinary.TheImageKind = getImageKind(
+ sys::path::extension(KeyAndValue.getValue()).drop_front());
+ } else if (Key == "kind") {
+ ImageBinary.TheOffloadKind = getOffloadKind(KeyAndValue.getValue());
+ } else {
+ ImageBinary.StringData[Key] = KeyAndValue.getValue();
+ }
+ }
+ std::unique_ptr<MemoryBuffer> Buffer = OffloadBinary::write(ImageBinary);
+ OS << Buffer->getBuffer();
+ }
+
+ Expected<std::unique_ptr<FileOutputBuffer>> OutputOrErr =
+ FileOutputBuffer::create(OutputFile, BinaryData.size());
+ if (!OutputOrErr)
+ return reportError(OutputOrErr.takeError());
+ std::unique_ptr<FileOutputBuffer> Output = std::move(*OutputOrErr);
+ std::copy(BinaryData.begin(), BinaryData.end(), Output->getBufferStart());
+ if (Error E = Output->commit())
+ return reportError(std::move(E));
+}
More information about the cfe-commits
mailing list