[Mlir-commits] [mlir] 8411886 - [mlir][SYCL] Fail init errors cleanly instead of `abort`ing (#192979)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Tue May 5 10:53:53 PDT 2026
Author: Zmicier Prybysh
Date: 2026-05-05T12:53:48-05:00
New Revision: 841188658f46d33f2facfd2594425c257fa5aa49
URL: https://github.com/llvm/llvm-project/commit/841188658f46d33f2facfd2594425c257fa5aa49
DIFF: https://github.com/llvm/llvm-project/commit/841188658f46d33f2facfd2594425c257fa5aa49.diff
LOG: [mlir][SYCL] Fail init errors cleanly instead of `abort`ing (#192979)
Disclaimer: this is my first PR to LLVM. I'm trying to follow the
contribution guide and the conventions i see in other PRs, but if i
missed something -- please let me know.
## Summary
Fixes #182807.
When the SYCL runtime wrapper is loaded on a host without a Level-Zero
backend, `getDefaultDevice()` throws an `std::runtime_error`, `catchAll`
catches it, and calls `abort()`, which results in a "PLEASE submit a bug
report" stack dump, which is not correct for this kind of crash.
With this change `catchAll` now writes to stderr and terminates via
`std::exit(EXIT_FAILURE)`, yielding a clean exit code 1 with no crash
dump. The "getDefaultDevice failed" message is also replaced with a
(hopefully) better one.
## Implementation notes
The runners in `mlir/lib/ExecutionEngine/` don't have much consistency
with regards as to how to output diagnostics. 2 files use
`llvm::outs/errs`, 9 files use `std::cout/cerr`, 8 files use `fprintf`.
I left `fprintf` in place, since it was used in that module before and
it's also used by CUDA and ROCm runners (the most dominant ones in the
folder). I'm open to changing it to something else if needed.
## Notes for reviewers
1. Why this fix strategy doesn't match the CUDA/ROCm wrappers'
print-warning-and-continue style: The `print-and-continue` approach was
tried and it produces worse behavior with SYCL wrapper. The entry points
would return `nullptr` on init failure, but the MLIR JIT runner does not
null-check those returns, so the `nullptr` is dereferenced inside a
later SYCL call and produces another SIGSEGV and the same crash dump
this patch is trying to avoid. Making that style work would require
null-checks across the downstream wrappers. We can go that way if
desired, but i opted to start with a simpler fix.
2. Why `L0_SAFE_CALL` (`SyclRuntimeWrappers.cpp`, line 40) is left
untouched: it's a deliberate decision, as `L0_SAFE_CALL` is supposed to
abort on Level-Zero errors (kernel launch, OOM, etc). I can reconsider
if there are some arguments for updating error handling strategy in
`L0_SAFE_CALL` too, I just don't see any yet.
3. There's no unit test with this change, as I wasn't able to find a
test that would test this logic (i guess this kind of logic isn't
usually tested with unit tests anyway). I did test it manually, though,
for both positive and negative cases.
Added:
Modified:
mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
Removed:
################################################################################
diff --git a/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
index d4f557cda96786..d93aadb098b2e3 100644
--- a/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
@@ -10,6 +10,8 @@
//
//===----------------------------------------------------------------------===//
+#include <cstdlib>
+
#include <level_zero/ze_api.h>
#include <sycl/ext/oneapi/backend/level_zero.hpp>
#include <sycl/sycl.hpp>
@@ -27,13 +29,13 @@ auto catchAll(F &&func) {
try {
return func();
} catch (const std::exception &e) {
- fprintf(stdout, "An exception was thrown: %s\n", e.what());
- fflush(stdout);
- abort();
+ fprintf(stderr, "SYCL runtime error: %s\n", e.what());
+ fflush(stderr);
+ std::exit(EXIT_FAILURE);
} catch (...) {
- fprintf(stdout, "An unknown exception was thrown\n");
- fflush(stdout);
- abort();
+ fprintf(stderr, "SYCL runtime error: unknown exception was thrown\n");
+ fflush(stderr);
+ std::exit(EXIT_FAILURE);
}
}
@@ -41,8 +43,8 @@ auto catchAll(F &&func) {
{ \
ze_result_t status = (call); \
if (status != ZE_RESULT_SUCCESS) { \
- fprintf(stdout, "L0 error %d\n", status); \
- fflush(stdout); \
+ fprintf(stderr, "L0 error %d\n", status); \
+ fflush(stderr); \
abort(); \
} \
}
@@ -64,7 +66,9 @@ static sycl::device getDefaultDevice() {
isDeviceInitialised = true;
return syclDevice;
}
- throw std::runtime_error("getDefaultDevice failed");
+ throw std::runtime_error(
+ "no Level-Zero SYCL platform found; the MLIR SYCL runtime wrapper "
+ "currently requires a Level-Zero backend");
} else {
return syclDevice;
}
More information about the Mlir-commits
mailing list