[Openmp-commits] [PATCH] D94256: [OpenMP] Fixed the issue that memory manager calls plugin interface to release buffered memory even after the plugin has been destroyed

Shilei Tian via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Jan 7 19:38:30 PST 2021


tianshilei1992 marked 2 inline comments as done.
tianshilei1992 added inline comments.


================
Comment at: openmp/libomptarget/src/rtl.cpp:444-445
 
   // TODO: Remove RTL and the devices it manages if it's not used anymore?
   // TODO: Write some RTL->unload_image(...) function?
 
----------------
vzakhari wrote:
> tianshilei1992 wrote:
> > vzakhari wrote:
> > > tianshilei1992 wrote:
> > > > vzakhari wrote:
> > > > > tianshilei1992 wrote:
> > > > > > vzakhari wrote:
> > > > > > > Can we just implement this instead?
> > > > > > At this point, the plugin has already been destroyed.
> > > > > How is that possible, since we never call dlclose() for the plugins?
> > > > > Is the problem specific to some particular execution conditions?
> > > > > 
> > > > > Should we just "deinitialize" the memory manager here instead of postponing the "deinitialization" to MemoryManagerTy desctructor, which will obviously be executed at the wrong time.
> > > > That's most tough part of this question. `libomptarget` has no idea when its opened plugin will be destroyed. Actually, you can see there is no `dlclose` in current code. What we can know is, plugin will be destroyed before `libomptarget`. You can easily verify it by adding `DP(...)` to the destructor of the plugin.
> > > This sounds really strange at least for Linux, since:
> > > > The dynamic linker maintains reference counts for object handles, so a dynamically loaded shared object is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it. 
> > > 
> > > Do you have a reproducer for this kind of an issue?  I must be missing something here...
> > > 
> > > 
> > You can use `openmp/libomptarget/test/offloading/memory_manager.cpp` and turn debug print on via setting env `LIBOMPTARGET_DEBUG` to a large number like `10`. Then you can see a lot of logs.
> > ```
> > ...
> > Libomptarget --> omp_target_free deallocated device ptr
> > PASS
> > (1) Target CUDA RTL --> CUDA plugin destroyed
> > (2) Libomptarget --> __tgt_unregister_lib
> > Libomptarget --> Unloading target library!
> > Libomptarget --> Image 0x0000000000403870 is compatible with RTL 0x0000000000e4a8b0!
> > Libomptarget --> Unregistered image 0x0000000000403870 from RTL 0x0000000000e4a8b0!
> > Libomptarget --> Done unregistering images!
> > Libomptarget --> Removing translation table for descriptor 0x00000000004abb60
> > Libomptarget --> Done unregistering library!
> > Libomptarget --> Deinit target library!
> > Target CUDA RTL --> Error returned from cuCtxSetCurrent
> > Target CUDA RTL --> Unresolved CUDA error code: 4
> > Target CUDA RTL --> Unsuccessful cuGetErrorString return status: 4
> > Target CUDA RTL --> Error returned from cuCtxSetCurrent
> > ...
> > ```
> > Line (1) and (2) were printed because I added `DP` to the destructor of CUDA plugin and function `__tgt_unregister_lib`. The last tens of lines are all error messages because CUDA plugin was destroyed before `libomptarget`.
> Ah, I see.  The global destructor for `DeviceRTL` is called before `__tgt_unregister_lib` is called.  The plugin (the shared object) is not actually deallocated yet.  Thank you for the explanation.
> 
> I am not sure libomptarget has to be involved here, since this may be done exclusively at the plugin level, i.e. `DeviceRTL` may be dynamically allocated/deallocated using global init/fini functions, and then all plugin APIs may return early, if they are invoked after `DeviceRTL` has been deallocated.
> 
> Just to be clear, I am not against this change-set :)
> 
> Anyway, libomptarget has to be redesigned a bit because the plugins loading initiated in a global constructor (which calls `RTLsTy::RegisterLib`) is violating Microsoft conventions for DLL initialization (e.g. you cannot create a DLL with offload and then `LoadLibrary` this DLL, because this will result in `LoadLibrary`->...->`LoadLibrary` call stack, which is not allowed).  We should probably load the plugins on the first OpenMP construct/API encountered during the execution.  I assume we will have to define clear entry/exit points for plugins then, and this will solve the issue that you are addressing here.
`libomptarget` is involved because we have a memory manager which will releases all its buffered memory in its destructor. It is an object in `libomptarget`. So its destructor is invoked when `libomptarget` is being destroyed. Now we know at this moment the plugin has already been destroyed, and the memory manager still calls plugin interfaces, causing program crash on some platforms. However, usually it will not crash on CUDA platforms, just having some errors because the plugin is using some CUDA related data which have been explicitly destroyed when the plugin is destroyed.

Redesign of `libomptarget` is beyond the scope of this patch. It might be done when LLVM OpenMP decides to support Windows. For now this patch is like a WA.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94256/new/

https://reviews.llvm.org/D94256



More information about the Openmp-commits mailing list