[Openmp-commits] [PATCH] D94256: [OpenMP] Fixed the issue that memory manager calls plugin interface to release buffered memory even after the plugin has been destroyed
Shilei Tian via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jan 7 18:44:35 PST 2021
tianshilei1992 marked 4 inline comments as done.
tianshilei1992 added inline comments.
================
Comment at: openmp/libomptarget/src/rtl.cpp:444-445
// TODO: Remove RTL and the devices it manages if it's not used anymore?
// TODO: Write some RTL->unload_image(...) function?
----------------
vzakhari wrote:
> tianshilei1992 wrote:
> > vzakhari wrote:
> > > tianshilei1992 wrote:
> > > > vzakhari wrote:
> > > > > Can we just implement this instead?
> > > > At this point, the plugin has already been destroyed.
> > > How is that possible, since we never call dlclose() for the plugins?
> > > Is the problem specific to some particular execution conditions?
> > >
> > > Should we just "deinitialize" the memory manager here instead of postponing the "deinitialization" to MemoryManagerTy desctructor, which will obviously be executed at the wrong time.
> > That's most tough part of this question. `libomptarget` has no idea when its opened plugin will be destroyed. Actually, you can see there is no `dlclose` in current code. What we can know is, plugin will be destroyed before `libomptarget`. You can easily verify it by adding `DP(...)` to the destructor of the plugin.
> This sounds really strange at least for Linux, since:
> > The dynamic linker maintains reference counts for object handles, so a dynamically loaded shared object is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it.
>
> Do you have a reproducer for this kind of an issue? I must be missing something here...
>
>
You can use `openmp/libomptarget/test/offloading/memory_manager.cpp` and turn debug print on via setting env `LIBOMPTARGET_DEBUG` to a large number like `10`. Then you can see a lot of logs.
```
...
Libomptarget --> omp_target_free deallocated device ptr
PASS
(1) Target CUDA RTL --> CUDA plugin destroyed
(2) Libomptarget --> __tgt_unregister_lib
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000000403870 is compatible with RTL 0x0000000000e4a8b0!
Libomptarget --> Unregistered image 0x0000000000403870 from RTL 0x0000000000e4a8b0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x00000000004abb60
Libomptarget --> Done unregistering library!
Libomptarget --> Deinit target library!
Target CUDA RTL --> Error returned from cuCtxSetCurrent
Target CUDA RTL --> Unresolved CUDA error code: 4
Target CUDA RTL --> Unsuccessful cuGetErrorString return status: 4
Target CUDA RTL --> Error returned from cuCtxSetCurrent
...
```
Line (1) and (2) were printed because I added `DP` to the destructor of CUDA plugin and function `__tgt_unregister_lib`. The last tens of lines are all error messages because CUDA plugin was destroyed before `libomptarget`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94256/new/
https://reviews.llvm.org/D94256
More information about the Openmp-commits
mailing list