[Mlir-commits] [mlir] [mlir][SYCL] Fail init errors cleanly instead of `abort`ing (PR #192979)
Zmicier Prybysh
llvmlistbot at llvm.org
Mon Apr 20 07:03:39 PDT 2026
https://github.com/dimp-pl created https://github.com/llvm/llvm-project/pull/192979
Disclaimer: this is my first PR to LLVM. I'm trying to follow the contribution guide and the conventions i see in other PRs, but if i missed something -- please let me know.
## Summary
This PR fixes issue #182807.
When the SYCL runtime wrapper is loaded on a host without a Level-Zero backend, `getDefaultDevice()` throws an `std::runtime_error`, `catchAll` catches it, and calls `abort()`, which results in a "PLEASE submit a bug report" stack dump, which is not correct for this kind of crash.
With this change `catchAll` now writes to stderr and terminates via `std::exit(EXIT_FAILURE)`, yielding a clean exit code 1 with no crash dump. The "getDefaultDevice failed" message is also replaced with a (hopefully) better one.
## Implementation notes
The runners in `mlir/lib/ExecutionEngine/` don't have much consistency with regards as to how to output diagnostics. 2 files use `llvm::outs/errs`, 9 files use `std::cout/cerr`, 8 files use `fprintf`. I left `fprintf` in place, since it was used in that module before and it's also used by CUDA and ROCm runners (the most dominant ones in the folder). I'm open to changing it to something else if needed.
## Notes for reviewers
1. Why this fix strategy doesn't match the CUDA/ROCm wrappers' print-warning-and-continue style: The `print-and-continue` approach was tried and it produces worse behavior with SYCL wrapper. The entry points would return `nullptr` on init failure, but the MLIR JIT runner does not null-check those returns, so the `nullptr` is dereferenced inside a later SYCL call and produces another SIGSEGV and the same crash dump this patch is trying to avoid. Making that style work would require null-checks across the downstream wrappers. We can go that way if desired, but i opted to start with a simpler fix.
2. Why `L0_SAFE_CALL` (`SyclRuntimeWrappers.cpp`, line 40) is left untouched: it's a deliberate decision, as `L0_SAFE_CALL` is supposed to abort on Level-Zero errors (kernel launch, OOM, etc). I can reconsider if there are some arguments for updating error handling strategy in `L0_SAFE_CALL` too, I just don't see any yet.
3. There's no unit test with this change, as I wasn't able to find a test that would test this logic (i guess this kind of logic isn't usually tested with unit tests anyway). I did test it manually, though, for both positive and negative cases.
>From e3b4278cb18da1b7005b9e82073e674edc18d5a8 Mon Sep 17 00:00:00 2001
From: Zmicier Prybysh <zprybysh at baylibre.com>
Date: Sun, 19 Apr 2026 15:29:44 +0200
Subject: [PATCH] [mlir][SYCL] Fail init errors cleanly instead of `abort`ing
Fixes #182807.
When the SYCL runtime wrapper is loaded on a host without a Level-Zero
backend, `getDefaultDevice()` throws an `std::runtime_error`,
`catchAll` catches it, and calls `abort()`, which results in a
"PLEASE submit a bug report" stack dump, which is not correct for
this kind of crash.
`catchAll` now writes to stderr and terminates via
`std::exit(EXIT_FAILURE)`, yielding a clean exit code 1 with no
crash dump. The "getDefaultDevice failed" message is also
replaced with a (hopefully) better one.
---
mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
index d4f557cda9678..932a08b3c22d2 100644
--- a/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/SyclRuntimeWrappers.cpp
@@ -27,13 +27,13 @@ auto catchAll(F &&func) {
try {
return func();
} catch (const std::exception &e) {
- fprintf(stdout, "An exception was thrown: %s\n", e.what());
- fflush(stdout);
- abort();
+ fprintf(stderr, "SYCL runtime error: %s\n", e.what());
+ fflush(stderr);
+ std::exit(EXIT_FAILURE);
} catch (...) {
- fprintf(stdout, "An unknown exception was thrown\n");
- fflush(stdout);
- abort();
+ fprintf(stderr, "SYCL runtime error: unknown exception was thrown\n");
+ fflush(stderr);
+ std::exit(EXIT_FAILURE);
}
}
@@ -64,7 +64,9 @@ static sycl::device getDefaultDevice() {
isDeviceInitialised = true;
return syclDevice;
}
- throw std::runtime_error("getDefaultDevice failed");
+ throw std::runtime_error(
+ "no Level-Zero SYCL platform found; the MLIR SYCL runtime wrapper "
+ "currently requires a Level-Zero backend");
} else {
return syclDevice;
}
More information about the Mlir-commits
mailing list