[Openmp-commits] [openmp] 545fcc3 - [OpenMP][CUDA] Fix potential program crash caused by double free resources

Shilei Tian via Openmp-commits openmp-commits at lists.llvm.org
Fri Mar 25 19:49:36 PDT 2022


Author: Shilei Tian
Date: 2022-03-25T22:49:32-04:00
New Revision: 545fcc3d842c0912db61591520bd4f760686c5a3

URL: https://github.com/llvm/llvm-project/commit/545fcc3d842c0912db61591520bd4f760686c5a3
DIFF: https://github.com/llvm/llvm-project/commit/545fcc3d842c0912db61591520bd4f760686c5a3.diff

LOG: [OpenMP][CUDA] Fix potential program crash caused by double free resources

As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122014

Added: 
    

Modified: 
    openmp/libomptarget/plugins/cuda/src/rtl.cpp

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/plugins/cuda/src/rtl.cpp b/openmp/libomptarget/plugins/cuda/src/rtl.cpp
index 0a6d023a7fc08..8bfac149cb83d 100644
--- a/openmp/libomptarget/plugins/cuda/src/rtl.cpp
+++ b/openmp/libomptarget/plugins/cuda/src/rtl.cpp
@@ -224,22 +224,32 @@ template <typename AllocTy> class ResourcePoolTy {
   size_t Next = 0;
   /// Mutex to guard the pool.
   std::mutex Mutex;
-  /// Pool of resources.
+  /// Pool of resources. The 
diff erence between \p Resources and \p Pool is,
+  /// when a resource is acquired and released, it is all on \p Resources. When
+  /// a batch of new resources are needed, they are both added to \p Resources
+  /// and \p Pool. The reason for this setting is, \p Resources could contain
+  /// redundant elements because resources are not released, which can cause
+  /// double free. This setting makes sure that \p Pool always has every
+  /// resource allocated from the device.
   std::vector<ElementTy> Resources;
+  std::vector<ElementTy> Pool;
   /// A reference to the corresponding allocator.
   AllocTy Allocator;
 
   /// If `Resources` is used up, we will fill in more resources. It assumes that
   /// the new size `Size` should be always larger than the current size.
   bool resize(size_t Size) {
+    assert(Resources.size() == Pool.size() && "size mismatch");
     auto CurSize = Resources.size();
     assert(Size > CurSize && "Unexpected smaller size");
+    Pool.reserve(Size);
     Resources.reserve(Size);
     for (auto I = CurSize; I < Size; ++I) {
       ElementTy NewItem;
       int Ret = Allocator.create(NewItem);
       if (Ret != OFFLOAD_SUCCESS)
         return false;
+      Pool.push_back(NewItem);
       Resources.push_back(NewItem);
     }
     return true;
@@ -300,8 +310,9 @@ template <typename AllocTy> class ResourcePoolTy {
   /// Released all stored resources and clear the pool.
   /// Note: This function is not thread safe. Be sure to guard it if necessary.
   void clear() noexcept {
-    for (auto &R : Resources)
+    for (auto &R : Pool)
       (void)Allocator.destroy(R);
+    Pool.clear();
     Resources.clear();
   }
 };


        


More information about the Openmp-commits mailing list