[PATCH] D83372: Fix for memory leak reported by Valgrind

Wed Jan 27 07:39:38 PST 2021

amaiorano added a comment.

> What sort of situation are you dealing with/how does this issue manifest for you?

So basically we produce a shared library (DLL, .so, .dylib) that implements the Vulkan API. The issue manifests itself from our internal test bots which, as part of their tests, load the vulkan driver, perform Vulkan calls, validate the result, and then unload the driver. Some of these bots enable Leak Sanitizer (LSAN), and upon ending the test process, reports memory leaks - including those from and loaded/unloaded shared libraries. This also shows up on Windows when using Application Verifier.

> My understanding is that this would be a problem for users unloading/reloading a dynamic library (dll, so, dylib) but generally not an issue for static linking, for instance? & given, as @beanz said, llvm is unlikely to support re-use of llvm after shutdown, this doesn't seem like it'd go terribly well if you expect to be able to unload and then reload LLVM again later.

I'm not sure what the problem is with unloading and reloading LLVM. For our use-case, anyway, it seems to work fine. We use LLVM's JIT API, so maybe that has something to do with it, though I don't really understand what the problem could be in a general sense. Unloading a shared library should clean up all allocations and handled, then reloading it again should allocate and initialize everything again.

Now, having written all that, there's been some developments since I posted yesterday. Using the above latest patch, along with invoking llvm::llvm_shutdown(), worked fine on our Windows build, and no more leaks were detected by AppVerifier. Note that in order to invoke llvm::llvm_shutdown, I used the LLVM facility "llvm_shutdown_obj" class, declared as a global in one of our translation units (this class's destructor invokes llvm_shutdown()). Now, except for on Windows via DllMain, there is no hook for "unloading DLL before global objects are destroyed" on Linux or Mac (maybe also Android, and other platforms), so we create this global, which seemed to work fine. Except that it didn't work on Linux, and the reason is global object destruction order: llvm::llvm_shutdown locks the ManagedStaticMutex, so it must be called before the ManagedStaticMutex static instance is destroyed. On Windows, it happened to destroy my global llvm_shutdown_obj instance before it destroyed the ManagedStaticMutex instance (named 'm' in this patch). But on Linux, it was reversed. Classic C++ problem, where the only standard guarantee we have is that objects in the same translation unit are constructed in declaration order, and destructed in the reverse order; but for objects in different TUs, there is no guarantee.

I'm thinking this is probably the reason ManagedStaticMutex was allowed to leak in the first place: there was no way to guarantee that clients would invoke llvm::llvm_shutdown before this global got destroyed, especially when making use of the llvm_shutdown_obj RAII class.

Looking at Patch 5, the one just before this latest version, I think might work, since it relies on llvm_shutdown deleting the ManagedStaticMutex after it uses it. Of course, this also implies that this call should not be made from multiple threads anyway, so perhaps another solution is to keep the latest patch as-is, but remove the lock_guard in llvm_shutdown, and document that it must not be called from multiple threads, as @lattner proposed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83372/new/

https://reviews.llvm.org/D83372