[libc-commits] [libc] [libc] Add loader option to force serial execution of GPU region (PR #101601)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Thu Aug 1 19:21:35 PDT 2024
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/101601
Summary:
The loader is used as a test utility to run traditionally CPU based unit
tests on the GPU. This has issues when used with something like
`llvm-lit` because the GPU runtimes have a nasty habit of either running
out of resources or hanging when they are overloaded. To combat this, I
added this option to force each process to perform the GPU part
serially.
This is done right now with a simple file lock on the executing file. I
was originally thinking about using more complex IPC to allow N
processes to share execution, but that seemed overly complicated given
the incredibly large number of failure modes it introduces. File locks
are nice here because if the process crashes or is killed it will
release the lock automatically (at least on Linux). This is in contrast
to something like POSIX shared memory which will stick around until it's
unlinked, meaning that if someone did `sigkill` on the program it would
never get cleaned up and other threads might wait on a mutex that never
occurs.
Restricting this to one thread isn't overly ideal, given the fact that
the runtime can likely handle at least a *few* separate processes, but
this was easy and it works, so might as well start here. This will
hopefully unblock me on running `libcxx` tests, as those ran with so
much parallelism spurious failures were very common.
>From 2f2141ad0f7728938ad29ea075a1f0ac044212b0 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 1 Aug 2024 21:06:04 -0500
Subject: [PATCH] [libc] Add loader option to force serial execution of GPU
region
Summary:
The loader is used as a test utility to run traditionally CPU based unit
tests on the GPU. This has issues when used with something like
`llvm-lit` because the GPU runtimes have a nasty habit of either running
out of resources or hanging when they are overloaded. To combat this, I
added this option to force each process to perform the GPU part
serially.
This is done right now with a simple file lock on the executing file. I
was originally thinking about using more complex IPC to allow N
processes to share execution, but that seemed overly complicated given
the incredibly large number of failure modes it introduces. File locks
are nice here because if the process crashes or is killed it will
release the lock automatically (at least on Linux). This is in contrast
to something like POSIX shared memory which will stick around until it's
unlinked, meaning that if someone did `sigkill` on the program it would
never get cleaned up and other threads might wait on a mutex that never
occurs.
Restricting this to one thread isn't overly ideal, given the fact that
the runtime can likely handle at least a *few* separate processes, but
this was easy and it works, so might as well start here. This will
hopefully unblock me on running `libcxx` tests, as those ran with so
much parallelism spurious failures were very common.
---
libc/utils/gpu/loader/Main.cpp | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/libc/utils/gpu/loader/Main.cpp b/libc/utils/gpu/loader/Main.cpp
index 44ed8bf58ab87..7037d772ad2bc 100644
--- a/libc/utils/gpu/loader/Main.cpp
+++ b/libc/utils/gpu/loader/Main.cpp
@@ -11,6 +11,8 @@
//
//===----------------------------------------------------------------------===//
+#include <sys/file.h>
+
#include "Loader.h"
#include "llvm/BinaryFormat/Magic.h"
@@ -62,6 +64,12 @@ static cl::opt<bool>
cl::desc("Output resource usage of launched kernels"),
cl::init(false), cl::cat(loader_category));
+static cl::opt<bool>
+ no_parallelism("no-parallelism",
+ cl::desc("Allows only a single process to use the GPU at a "
+ "time. Useful to suppress out-of-resource errors"),
+ cl::init(false), cl::cat(loader_category));
+
static cl::opt<std::string> file(cl::Positional, cl::Required,
cl::desc("<gpu executable>"),
cl::cat(loader_category));
@@ -98,6 +106,15 @@ int main(int argc, const char **argv, const char **envp) {
llvm::transform(args, std::back_inserter(new_argv),
[](const std::string &arg) { return arg.c_str(); });
+ // Claim a file lock on the executable so only a single process can enter this
+ // region if requested. This prevents the loader from spurious failures.
+ int fd = -1;
+ if (no_parallelism) {
+ fd = open(argv[0], O_RDONLY);
+ if (flock(fd, LOCK_EX) == 1)
+ report_error(createStringError("Failed to lock '%s'", argv[0]));
+ }
+
// Drop the loader from the program arguments.
LaunchParameters params{threads_x, threads_y, threads_z,
blocks_x, blocks_y, blocks_z};
@@ -105,5 +122,10 @@ int main(int argc, const char **argv, const char **envp) {
const_cast<char *>(image.getBufferStart()),
image.getBufferSize(), params, print_resource_usage);
+ if (no_parallelism) {
+ if (flock(fd, LOCK_UN) == 1)
+ report_error(createStringError("Failed to unlock '%s'", argv[0]));
+ }
+
return ret;
}
More information about the libc-commits
mailing list