[PATCH] D62258: [scudo][standalone] Introduce the thread specific data structures

Kostya Kortchinsky via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 24 08:50:43 PDT 2019


cryptoad marked 12 inline comments as done.
cryptoad added a comment.

Matt, thank you for all the reviews you are doing. Very insightful points.



================
Comment at: lib/scudo/standalone/tests/tsd_test.cc:34
+  }
+  void reset() { memset(this, 0, sizeof(*this)); }
+
----------------
morehouse wrote:
> I assume allocators will always be linker-initialized, which is why we have `reset` for testing instead of an `init`?
There is indeed a subtlety here, the `initOnce` function of a TSD will call the `initLinkerInitialized` function of the allocator.
Calling this `init` would (I feel), indicate the usual construct of `init` calling `initLinkedInitialized`, which shouldn't be the case here, we just want a nulled out structure but not call the `initLinkedInitialized` wanna-be-constructor.
Hence using `reset`.


================
Comment at: lib/scudo/standalone/tsd_exclusive.h:60
+
+  NOINLINE void initThread(Allocator *Instance, bool MinimalInit) {
+    if (UNLIKELY(!OnceDone))
----------------
morehouse wrote:
> Will threads have different allocators?
Multiple allocators can coexist in a process, hence a need to know which allocator a registry belongs to.


================
Comment at: lib/scudo/standalone/tsd_exclusive.h:64
+    if (MinimalInit)
+      return;
+    CHECK_EQ(
----------------
morehouse wrote:
> What is the use-case for `MinimalInit` initialization?
Right, this is documented in combined.h that hasn't landed yet, here is the comment:
```
    // For a deallocation, we only ensure minimal initialization, meaning thread
    // local data will be left uninitialized for now (when using ELF TLS). The
    // fallback cache will be used instead. This is a workaround for a situation
    // where the only heap operation performed in a thread would be a free past
    // the TLS destructors, ending up in initialized thread specific data never
    // being destroyed properly. Any other heap operation will do a full init.
```
There is test case in the current Scudo for that situation that will be in as well.


================
Comment at: lib/scudo/standalone/tsd_exclusive.h:74
+  pthread_key_t PThreadKey;
+  TSD<Allocator> *FallbackTSD;
+  static THREADLOCAL ThreadState State;
----------------
morehouse wrote:
> Any reason this needs to be a pointer?
The main reason is for an uninitialized allocator to take as little memory space as possible.
A TSD is usually around 8kB (varying based on the number of classes and pointers cached).
It costs an extra map() on init, but allows several allocators to coexist with using extra memory (the code footprint cost still being there).


================
Comment at: lib/scudo/standalone/tsd_shared.h:23
+
+  ALWAYS_INLINE void initThreadMaybe(Allocator *Instance,
+                                     UNUSED bool MinimalInit) {
----------------
morehouse wrote:
> Should Allocator initialization just be done in `initLinkerInitialized`?
This question made me realize that something is a bit sketchy.
`initOnce` is initializing the allocator, which in turn is calling `initLinkerInitialized`of the registry: at this point we are in the spinmutex, and call the `initLinkerInitialized` method of that same mutex. It all works out because 1) everything is zeroinitialized 2) the `initLinkerInitialized` of the spinmutex is a no-op.
But it is definitely logically skewed.
The original version was using `pthread_once` which was alleviating the need for the mutex, but I needed the Instance parameter for `initOnce` (pthread_one doesn't allow parameters to the once function).
I am going to have to rethink that.


================
Comment at: lib/scudo/standalone/tsd_shared.h:61
+      if (A == 1)
+        CoPrimes[NumberOfCoPrimes++] = I + 1;
+    }
----------------
morehouse wrote:
> What are the CoPrimes used for, and what is the strange calculation above doing?
I added a comment. The original idea was from Dmitry, but there are online reference such as:
https://lemire.me/blog/2017/09/18/visiting-all-values-in-an-array-exactly-once-in-random-order/


================
Comment at: lib/scudo/standalone/tsd_shared.h:74
+        pthread_setspecific(PThreadKey, reinterpret_cast<void *>(CurrentTSD)),
+        0);
+#endif
----------------
morehouse wrote:
> Do we really need three ways to do TLS?  Can we just use pthreads for all?
The ELF TLS would be the fastest as it's only a couple of instructions to access the data relative to %fs.
The pthread implementations vary widely. They require a call, which is not ideal, but also they range in terms of efficiency.
The `getAndroidTlsPtr` ends up being a few asm instructions as well.

Now with the introduction of ELF TLS in Android (beginning of this year), we might be able to get rid of the Android TLS ptr trick.
I haven't tested it yet. Also I am not sure the ELF TLS works from within the libc (it doesn't on Fuchsia for example).


================
Comment at: lib/scudo/standalone/tsd_shared.h:97
+  NOINLINE TSD<Allocator> *getTSDAndLockSlow(TSD<Allocator> *CurrentTSD) {
+    if (MaxTSDCount > 1U && NumberOfTSDs > 1U) {
+      // Use the Precedence of the current TSD as our random seed. Since we are
----------------
morehouse wrote:
> Is it sufficient to just check `NumberOfTSDs`?
This is to help the compiler optimize the block out.
If `MaxTSDCount` is 1 (likely the svelte Android config case), the compiler doesn't know that NumberOfTSDs is 1 and leaves the block in.
The `MaxTSDCount` check will be optimized at compile time, and for the 1 case, the whole block goes away.


================
Comment at: lib/scudo/standalone/tsd_shared.h:123
+        if (Index >= NumberOfTSDs)
+          Index -= NumberOfTSDs;
+      }
----------------
morehouse wrote:
> `Index %= NumberOfTSDs`
Here the division is potentially costly, hence going for a comparison and subtraction.
I am not sure the performance difference is significant in this situation (slow path, 4 iterations at most) but I tried to avoid divisions everywhere as a general rule of thumb.


================
Comment at: lib/scudo/standalone/tsd_shared.h:144
+  u32 NumberOfTSDs;
+#if SCUDO_LINUX && !SCUDO_ANDROID
+  static THREADLOCAL TSD<Allocator> *ThreadTSD;
----------------
morehouse wrote:
> Why do we need these guards here but not in tsd_exclusive.h?
The exclusive version only uses ELF TLS. So if opting for an exclusive TSD it assumes the platform supports it.
For the shared version, it can work without ELF TLS, hence having `THREADLOCAL` variables within defines.
A couple of things to consider:
- Fuchsia supports `THREADLOCAL`, but not within the libc, and we are in the libc.
- Android recently got support for ELF TLS, but 1) I haven't tested it yet 2) I don't know if `THREADLOCAL` works from within the C library

I will revisit this when things settle, but as of now, I am sure this version works.



Repository:
  rCRT Compiler Runtime

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62258/new/

https://reviews.llvm.org/D62258





More information about the llvm-commits mailing list