[Openmp-commits] [PATCH] D106774: [libomptarget][amdgpu] More robust handling of failure to init HSA
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Sun Jul 25 13:49:01 PDT 2021
JonChesterfield created this revision.
JonChesterfield added reviewers: jdoerfert, tianshilei1992, ronlieb, pdhaliwal, grokos, ye-luo.
Herald added subscribers: kerbowa, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
JonChesterfield requested review of this revision.
Herald added subscribers: openmp-commits, wdng.
Herald added a project: OpenMP.
If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.
This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa
Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D106774
Files:
openmp/libomptarget/plugins/amdgpu/impl/system.cpp
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
Index: openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
===================================================================
--- openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
+++ openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
@@ -436,13 +436,14 @@
/// Class containing all the device information
class RTLDeviceInfoTy {
std::vector<std::list<FuncOrGblEntryTy>> FuncGblEntries;
+ bool HSAInitializeSucceeded = false;
public:
// load binary populates symbol tables and mutates various global state
// run uses those symbol tables
std::shared_timed_mutex load_run_lock;
- int NumberOfDevices;
+ int NumberOfDevices = 0;
// GPU devices
std::vector<hsa_agent_t> HSAAgents;
@@ -688,7 +689,9 @@
DP("Start initializing HSA-ATMI\n");
hsa_status_t err = core::atl_init_gpu_context();
- if (err != HSA_STATUS_SUCCESS) {
+ if (err == HSA_STATUS_SUCCESS) {
+ HSAInitializeSucceeded = true;
+ } else {
DP("Error when initializing HSA-ATMI\n");
return;
}
@@ -791,6 +794,10 @@
~RTLDeviceInfoTy() {
DP("Finalizing the HSA-ATMI DeviceInfo.\n");
+ if (!HSAInitializeSucceeded) {
+ // Then none of these can have been set up and they can't be torn down
+ return;
+ }
// Run destructors on types that use HSA before
// atmi_finalize removes access to it
deviceStateStore.clear();
@@ -969,6 +976,9 @@
// this is per device id init
DP("Initialize the device id: %d\n", device_id);
+ if ((device_id < 0) || (device_id >= DeviceInfo.HSAAgents.size())) {
+ return OFFLOAD_FAIL;
+ }
hsa_agent_t agent = DeviceInfo.HSAAgents[device_id];
Index: openmp/libomptarget/plugins/amdgpu/impl/system.cpp
===================================================================
--- openmp/libomptarget/plugins/amdgpu/impl/system.cpp
+++ openmp/libomptarget/plugins/amdgpu/impl/system.cpp
@@ -356,12 +356,8 @@
DEBUG_PRINT("Initializing HSA...");
hsa_status_t err = hsa_init();
if (err != HSA_STATUS_SUCCESS) {
- printf("[%s:%d] %s failed: %s\n", __FILE__, __LINE__,
- "Initializing the hsa runtime", get_error_string(err));
return err;
}
- if (err != HSA_STATUS_SUCCESS)
- return err;
err = init_compute_and_memory();
if (err != HSA_STATUS_SUCCESS)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D106774.361533.patch
Type: text/x-patch
Size: 2281 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20210725/2cc3c3f1/attachment.bin>
More information about the Openmp-commits
mailing list