[Openmp-commits] [PATCH] D122014: [OpenMP][CUDA] Fix potential program crash caused by double free resources

Shilei Tian via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Mar 18 09:34:00 PDT 2022


tianshilei1992 created this revision.
tianshilei1992 added reviewers: jdoerfert, jhuber6, JonChesterfield.
Herald added subscribers: carlosgalvezp, guansong, yaxunl.
Herald added a project: All.
tianshilei1992 requested review of this revision.
Herald added subscribers: openmp-commits, sstefan1.
Herald added a project: OpenMP.

As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:

1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D122014

Files:
  openmp/libomptarget/plugins/cuda/src/rtl.cpp


Index: openmp/libomptarget/plugins/cuda/src/rtl.cpp
===================================================================
--- openmp/libomptarget/plugins/cuda/src/rtl.cpp
+++ openmp/libomptarget/plugins/cuda/src/rtl.cpp
@@ -19,6 +19,7 @@
 #include <mutex>
 #include <string>
 #include <unordered_map>
+#include <unordered_set>
 #include <vector>
 
 #include "Debug.h"
@@ -218,7 +219,8 @@
 
 /// A generic pool of resources where \p T is the resource type.
 /// \p T should be copyable as the object is stored in \p std::vector .
-template <typename AllocTy> class ResourcePoolTy {
+/// \p NeedTracking is true if we need to track all elements that are given.
+template <typename AllocTy, bool NeedTracking = false> class ResourcePoolTy {
   using ElementTy = typename AllocTy::ElementTy;
   /// Index of the next available resource.
   size_t Next = 0;
@@ -226,6 +228,9 @@
   std::mutex Mutex;
   /// Pool of resources.
   std::vector<ElementTy> Resources;
+  /// Resources that have been given at least once. We don't remove the element
+  /// when a resource is released.
+  std::unordered_set<ElementTy> ResourcesGiven;
   /// A reference to the corresponding allocator.
   AllocTy Allocator;
 
@@ -277,6 +282,9 @@
 
     R = Resources[Next++];
 
+    if (NeedTracking)
+      ResourcesGiven.insert(R);
+
     return OFFLOAD_SUCCESS;
   }
 
@@ -300,9 +308,18 @@
   /// Released all stored resources and clear the pool.
   /// Note: This function is not thread safe. Be sure to guard it if necessary.
   void clear() noexcept {
-    for (auto &R : Resources)
-      (void)Allocator.destroy(R);
-    Resources.clear();
+    if (NeedTracking) {
+      for (auto &R : Resources)
+        ResourcesGiven.insert(R);
+      Resources.clear();
+      for (auto &R : ResourcesGiven)
+        (void)Allocator.destroy(R);
+      ResourcesGiven.clear();
+    } else {
+      for (auto &R : Resources)
+        (void)Allocator.destroy(R);
+      Resources.clear();
+    }
   }
 };
 
@@ -332,7 +349,7 @@
   using StreamPoolTy = ResourcePoolTy<StreamAllocatorTy>;
   std::vector<std::unique_ptr<StreamPoolTy>> StreamPool;
 
-  using EventPoolTy = ResourcePoolTy<EventAllocatorTy>;
+  using EventPoolTy = ResourcePoolTy<EventAllocatorTy, /* NeedTracking*/ true>;
   std::vector<std::unique_ptr<EventPoolTy>> EventPool;
 
   std::vector<DeviceDataTy> DeviceData;


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D122014.416525.patch
Type: text/x-patch
Size: 2347 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20220318/9efd914b/attachment-0001.bin>


More information about the Openmp-commits mailing list