[PATCH] D122922: [lld][common][lld-macho] Support per-thread allocators and StringSavers

Wed Apr 6 12:24:58 PDT 2022

int3 added inline comments.

================
Comment at: lld/Common/Memory.cpp:35
+PerThreadContext *CommonLinkerContext::perThreadContext() {
+  llvm::sys::RWMutex &contextMutex = context().contextMutex;
+  void *contextKey = &threadTag;
----------------
oontvoo wrote:
> int3 wrote:
> > oontvoo wrote:
> > > int3 wrote:
> > > > I was hoping we could avoid mutexes altogether. I was thinking of something like
> > > > 
> > > > ```
> > > > std::atomic<size_t> contextCount;
> > > > std::array<PerThreadContext *, MAX_THREADS> perThreadContexts;
> > > > 
> > > > if (threadTag == 0) {
> > > >   threadTag = contextCount++;
> > > >   perThreadContexts[threadTag] = new PerThreadContext;
> > > > }
> > > > ```
> > > What is the value of MAX_THREADS?  Doesn't that mean "max number of threads that can run concurrently" and not "max number of threads ever created during the application runtime"?
> > > 
> > > As new threads are spawned up, the arrays can expand and that'd be a potential race condition, yes?
> > No, I meant "max number of threads ever created" :)
> > 
> > Since we use threadpools, we shouldn't be creating that many threads over the application lifetime anyway.
> > 
> > But you got me thinking if there's a nicer way to implement this that doesn't involve the user having to calculate that number up front. We could have a thread-local pointer to the per-thread context, so that checking if the context has been creating is a simple null check (as opposed to the current hashmap lookup.) We only take a lock the first time a thread executes and needs to create a new PerThreadContext, in order to safely add it to a global vector.
> > 
> > To make it 100% lock-free, we could implement our own append-only dynamically expanding series of arrays, much like how the BumpPtrAllocator uses its Slabs. But that's probably overkill :)
> ok, gotcha! benchmarking showed no difference between the previous and the last approach (https://reviews.llvm.org/D123075)
> 
> Which one do you prefer?
> I guess I liked this patch (updated diff) a bit better
> constexpr uint32_t MAX_THREADS = std::numeric_limits<int32>::max() - 1;

doesn't that mean our std::array is now taking up like 32 MB 🤔 I was thinking of using a much lower number...

but IMO the vector solution I suggested above is cleaner. we would need a lock when pushing onto the vector, but we'll only need to do it once per thread, and we wouldn't need to pre-allocate a whole bunch of memory

> benchmarking showed no difference between the previous and the last approach

good to know! do we have any regression vs the non-thread-safe version?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122922/new/

https://reviews.llvm.org/D122922