[PATCH] D46978: Delay emutls deallocation for one round

Ryan Prichard via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 21 23:00:27 PDT 2018


rprichard added a comment.

My emutls.c comment:

> We can't wait until the final two rounds, because jemalloc needs two rounds
>  after the final malloc/free call to free its thread-specific data.

It might be helpful to explain on this review why jemalloc needs 2 rounds. Here's my comment from Google's bug tracker, http://b/78022094#comment13. I'll link to it from the code.

> jemalloc's TSD (thread-specific data) code has four compile-time modes:
> 
> - JEMALLOC_MALLOC_THREAD_CLEANUP: uses __thread variables, and the code using jemalloc must call a _malloc_thread_cleanup function at thread-exit. Apparently FreeBSD uses this.
> - JEMALLOC_TLS: uses __thread variables and a pthread key destructor
> - _WIN32: self-explanatory
> - default: uses pthread_getspecific / pthread_setspecific and a key destructor
> 
> Bionic must use the final mode (emutls uses malloc, so malloc can't use emutls).
> 
> A jemalloc TSD has four explicit states:
> 
> - uninitialized
> - nominal
> - purgatory
> - reincarnated
> 
> Summary of jemalloc cleanup states:
> 
> - The typical state is nominal
> - On nominal cleanup: free everything but the outer TSD struct, move to purgatory state, and schedule another dtor call
> - On purgatory cleanup: free the TSD struct
> - On reincarnated cleanup: move to purgatory state and schedule another dtor call
> - Calling malloc/free moves the state from purgatory to reincarnated.
> 
> jemalloc needs 2 pthread destructor rounds to free its TSD (nominal -> purgatory, purgatory -> deallocated).
> 
> If emutls cleanup happens on round 2 instead of 3, then jemalloc can be completely deallocated at the start of round 2, then reinitialized when emutls calls free. In round 3, jemalloc would enter purgatory, and in round 4, it would be deallocated again. I *could* try to avoid this with an otherwise pointless realloc call in emutls round 1, but it doesn't seem to make thread exit that much slower. I measured a slowdown on the order of ~10-20us per thread exit.


Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D46978





More information about the llvm-commits mailing list