[Openmp-commits] [openmp] dd0b463 - [libomptarget][amdgpu] More robust handling of failure to init HSA
Jon Chesterfield via Openmp-commits
openmp-commits at lists.llvm.org
Sun Jul 25 15:16:17 PDT 2021
Author: Jon Chesterfield
Date: 2021-07-25T23:15:58+01:00
New Revision: dd0b463dd9ed4901a2e8fec498931bdf94a3f656
URL: https://github.com/llvm/llvm-project/commit/dd0b463dd9ed4901a2e8fec498931bdf94a3f656
DIFF: https://github.com/llvm/llvm-project/commit/dd0b463dd9ed4901a2e8fec498931bdf94a3f656.diff
LOG: [libomptarget][amdgpu] More robust handling of failure to init HSA
If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.
This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa
Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106774
Added:
Modified:
openmp/libomptarget/plugins/amdgpu/impl/system.cpp
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
Removed:
################################################################################
diff --git a/openmp/libomptarget/plugins/amdgpu/impl/system.cpp b/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
index 1494d161677b8..7fd8d57737e91 100644
--- a/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
+++ b/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
@@ -356,12 +356,8 @@ hsa_status_t init_hsa() {
DEBUG_PRINT("Initializing HSA...");
hsa_status_t err = hsa_init();
if (err != HSA_STATUS_SUCCESS) {
- printf("[%s:%d] %s failed: %s\n", __FILE__, __LINE__,
- "Initializing the hsa runtime", get_error_string(err));
return err;
}
- if (err != HSA_STATUS_SUCCESS)
- return err;
err = init_compute_and_memory();
if (err != HSA_STATUS_SUCCESS)
diff --git a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
index 278134a382a06..b5651ab9e89a1 100644
--- a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
@@ -436,13 +436,14 @@ struct EnvironmentVariables {
/// Class containing all the device information
class RTLDeviceInfoTy {
std::vector<std::list<FuncOrGblEntryTy>> FuncGblEntries;
+ bool HSAInitializeSucceeded = false;
public:
// load binary populates symbol tables and mutates various global state
// run uses those symbol tables
std::shared_timed_mutex load_run_lock;
- int NumberOfDevices;
+ int NumberOfDevices = 0;
// GPU devices
std::vector<hsa_agent_t> HSAAgents;
@@ -688,7 +689,9 @@ class RTLDeviceInfoTy {
DP("Start initializing HSA-ATMI\n");
hsa_status_t err = core::atl_init_gpu_context();
- if (err != HSA_STATUS_SUCCESS) {
+ if (err == HSA_STATUS_SUCCESS) {
+ HSAInitializeSucceeded = true;
+ } else {
DP("Error when initializing HSA-ATMI\n");
return;
}
@@ -791,6 +794,10 @@ class RTLDeviceInfoTy {
~RTLDeviceInfoTy() {
DP("Finalizing the HSA-ATMI DeviceInfo.\n");
+ if (!HSAInitializeSucceeded) {
+ // Then none of these can have been set up and they can't be torn down
+ return;
+ }
// Run destructors on types that use HSA before
// atmi_finalize removes access to it
deviceStateStore.clear();
More information about the Openmp-commits
mailing list