[Openmp-commits] [openmp] dd0b463 - [libomptarget][amdgpu] More robust handling of failure to init HSA

Jon Chesterfield via Openmp-commits openmp-commits at lists.llvm.org
Sun Jul 25 15:16:17 PDT 2021


Author: Jon Chesterfield
Date: 2021-07-25T23:15:58+01:00
New Revision: dd0b463dd9ed4901a2e8fec498931bdf94a3f656

URL: https://github.com/llvm/llvm-project/commit/dd0b463dd9ed4901a2e8fec498931bdf94a3f656
DIFF: https://github.com/llvm/llvm-project/commit/dd0b463dd9ed4901a2e8fec498931bdf94a3f656.diff

LOG: [libomptarget][amdgpu] More robust handling of failure to init HSA

If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106774

Added: 
    

Modified: 
    openmp/libomptarget/plugins/amdgpu/impl/system.cpp
    openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/plugins/amdgpu/impl/system.cpp b/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
index 1494d161677b8..7fd8d57737e91 100644
--- a/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
+++ b/openmp/libomptarget/plugins/amdgpu/impl/system.cpp
@@ -356,12 +356,8 @@ hsa_status_t init_hsa() {
   DEBUG_PRINT("Initializing HSA...");
   hsa_status_t err = hsa_init();
   if (err != HSA_STATUS_SUCCESS) {
-    printf("[%s:%d] %s failed: %s\n", __FILE__, __LINE__,
-           "Initializing the hsa runtime", get_error_string(err));
     return err;
   }
-  if (err != HSA_STATUS_SUCCESS)
-    return err;
 
   err = init_compute_and_memory();
   if (err != HSA_STATUS_SUCCESS)

diff  --git a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
index 278134a382a06..b5651ab9e89a1 100644
--- a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
@@ -436,13 +436,14 @@ struct EnvironmentVariables {
 /// Class containing all the device information
 class RTLDeviceInfoTy {
   std::vector<std::list<FuncOrGblEntryTy>> FuncGblEntries;
+  bool HSAInitializeSucceeded = false;
 
 public:
   // load binary populates symbol tables and mutates various global state
   // run uses those symbol tables
   std::shared_timed_mutex load_run_lock;
 
-  int NumberOfDevices;
+  int NumberOfDevices = 0;
 
   // GPU devices
   std::vector<hsa_agent_t> HSAAgents;
@@ -688,7 +689,9 @@ class RTLDeviceInfoTy {
 
     DP("Start initializing HSA-ATMI\n");
     hsa_status_t err = core::atl_init_gpu_context();
-    if (err != HSA_STATUS_SUCCESS) {
+    if (err == HSA_STATUS_SUCCESS) {
+      HSAInitializeSucceeded = true;
+    } else {
       DP("Error when initializing HSA-ATMI\n");
       return;
     }
@@ -791,6 +794,10 @@ class RTLDeviceInfoTy {
 
   ~RTLDeviceInfoTy() {
     DP("Finalizing the HSA-ATMI DeviceInfo.\n");
+    if (!HSAInitializeSucceeded) {
+      // Then none of these can have been set up and they can't be torn down
+      return;
+    }
     // Run destructors on types that use HSA before
     // atmi_finalize removes access to it
     deviceStateStore.clear();


        


More information about the Openmp-commits mailing list