[Openmp-commits] [openmp] 6226270 - [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.

Dhruva Chakrabarti via Openmp-commits openmp-commits at lists.llvm.org
Wed Sep 29 09:22:30 PDT 2021


Author: Dhruva Chakrabarti
Date: 2021-09-29T09:22:07-07:00
New Revision: 622627025332af4bfa01057210f315b2278a9439

URL: https://github.com/llvm/llvm-project/commit/622627025332af4bfa01057210f315b2278a9439
DIFF: https://github.com/llvm/llvm-project/commit/622627025332af4bfa01057210f315b2278a9439.diff

LOG: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.

Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679

Added: 
    

Modified: 
    openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
index 767a0eb32b26b..1d2ed42eb746e 100644
--- a/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
@@ -2185,6 +2185,7 @@ int32_t __tgt_rtl_run_target_team_region_locked(
     packet->completion_signal = {0}; // may want a pool of signals
 
     KernelArgPool *ArgPool = nullptr;
+    void *kernarg = nullptr;
     {
       auto it = KernelArgPoolMap.find(std::string(KernelInfo->Name));
       if (it != KernelArgPoolMap.end()) {
@@ -2196,7 +2197,6 @@ int32_t __tgt_rtl_run_target_team_region_locked(
          device_id);
     }
     {
-      void *kernarg = nullptr;
       if (ArgPool) {
         assert(ArgPool->kernarg_segment_size == (arg_num * sizeof(void *)));
         kernarg = ArgPool->allocate(arg_num);
@@ -2240,29 +2240,29 @@ int32_t __tgt_rtl_run_target_team_region_locked(
       packet->kernarg_address = kernarg;
     }
 
-    {
-      hsa_signal_t s = DeviceInfo.FreeSignalPool.pop();
-      if (s.handle == 0) {
-        DP("Failed to get signal instance\n");
-        return OFFLOAD_FAIL;
-      }
-      packet->completion_signal = s;
-      hsa_signal_store_relaxed(packet->completion_signal, 1);
+    hsa_signal_t s = DeviceInfo.FreeSignalPool.pop();
+    if (s.handle == 0) {
+      DP("Failed to get signal instance\n");
+      return OFFLOAD_FAIL;
     }
+    packet->completion_signal = s;
+    hsa_signal_store_relaxed(packet->completion_signal, 1);
 
+    // Publish the packet indicating it is ready to be processed
     core::packet_store_release(reinterpret_cast<uint32_t *>(packet),
                                core::create_header(), packet->setup);
 
+    // Since the packet is already published, its contents must not be
+    // accessed any more
     hsa_signal_store_relaxed(queue->doorbell_signal, packet_id);
 
-    while (hsa_signal_wait_scacquire(packet->completion_signal,
-                                     HSA_SIGNAL_CONDITION_EQ, 0, UINT64_MAX,
+    while (hsa_signal_wait_scacquire(s, HSA_SIGNAL_CONDITION_EQ, 0, UINT64_MAX,
                                      HSA_WAIT_STATE_BLOCKED) != 0)
       ;
 
     assert(ArgPool);
-    ArgPool->deallocate(packet->kernarg_address);
-    DeviceInfo.FreeSignalPool.push(packet->completion_signal);
+    ArgPool->deallocate(kernarg);
+    DeviceInfo.FreeSignalPool.push(s);
   }
 
   DP("Kernel completed\n");


        


More information about the Openmp-commits mailing list