[PATCH] D122922: [lld][common][lld-macho] Support per-thread allocators and StringSavers

Wed Apr 6 12:37:49 PDT 2022

oontvoo added inline comments.

================
Comment at: lld/Common/Memory.cpp:35
+PerThreadContext *CommonLinkerContext::perThreadContext() {
+  llvm::sys::RWMutex &contextMutex = context().contextMutex;
+  void *contextKey = &threadTag;
----------------
int3 wrote:
> oontvoo wrote:
> > int3 wrote:
> > > oontvoo wrote:
> > > > int3 wrote:
> > > > > I was hoping we could avoid mutexes altogether. I was thinking of something like
> > > > > 
> > > > > ```
> > > > > std::atomic<size_t> contextCount;
> > > > > std::array<PerThreadContext *, MAX_THREADS> perThreadContexts;
> > > > > 
> > > > > if (threadTag == 0) {
> > > > >   threadTag = contextCount++;
> > > > >   perThreadContexts[threadTag] = new PerThreadContext;
> > > > > }
> > > > > ```
> > > > What is the value of MAX_THREADS?  Doesn't that mean "max number of threads that can run concurrently" and not "max number of threads ever created during the application runtime"?
> > > > 
> > > > As new threads are spawned up, the arrays can expand and that'd be a potential race condition, yes?
> > > No, I meant "max number of threads ever created" :)
> > > 
> > > Since we use threadpools, we shouldn't be creating that many threads over the application lifetime anyway.
> > > 
> > > But you got me thinking if there's a nicer way to implement this that doesn't involve the user having to calculate that number up front. We could have a thread-local pointer to the per-thread context, so that checking if the context has been creating is a simple null check (as opposed to the current hashmap lookup.) We only take a lock the first time a thread executes and needs to create a new PerThreadContext, in order to safely add it to a global vector.
> > > 
> > > To make it 100% lock-free, we could implement our own append-only dynamically expanding series of arrays, much like how the BumpPtrAllocator uses its Slabs. But that's probably overkill :)
> > ok, gotcha! benchmarking showed no difference between the previous and the last approach (https://reviews.llvm.org/D123075)
> > 
> > Which one do you prefer?
> > I guess I liked this patch (updated diff) a bit better
> > constexpr uint32_t MAX_THREADS = std::numeric_limits<int32>::max() - 1;
> 
> doesn't that mean our std::array is now taking up like 32 MB 🤔 I was thinking of using a much lower number...
> 
> but IMO the vector solution I suggested above is cleaner. we would need a lock when pushing onto the vector, but we'll only need to do it once per thread, and we wouldn't need to pre-allocate a whole bunch of memory
> 
> > benchmarking showed no difference between the previous and the last approach
> 
> good to know! do we have any regression vs the non-thread-safe version?
> but IMO the vector solution I suggested above is cleaner. we would need a lock when pushing onto the vector, but we'll only need to do it once per thread, and we wouldn't need to pre-allocate a > whole bunch of memory

fair enough - updated the diff to that approach.

> do we have any regression vs the non-thread-safe version?
 also no difference (testing by unconditionally define THREAD_SAFE_MEMORY to 1. )

Question: Do we want to just enable it now? (and users who don't want it can turn it off or call the old functions (now renamed to have "unsafe" suffix)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122922/new/

https://reviews.llvm.org/D122922