[clang] [Clang] Access tls_guard via llvm.threadlocal.address (PR #96633)

Fri Jul 5 01:22:07 PDT 2024

================
@@ -1070,13 +1084,20 @@ CodeGenFunction::GenerateCXXGlobalInitFunc(llvm::Function *Fn,
       // Mark as initialized before initializing anything else. If the
       // initializers use previously-initialized thread_local vars, that's
       // probably supposed to be OK, but the standard doesn't say.
-      Builder.CreateStore(llvm::ConstantInt::get(GuardVal->getType(),1), Guard);
-
-      // The guard variable can't ever change again.
+      // Get the thread-local address via intrinsic.
+      if (IsTLS)
+        GuardAddr = GuardAddr.withPointer(
+            Builder.CreateThreadLocalAddress(Guard.getPointer()),
+            NotKnownNonNull);
+      Builder.CreateStore(llvm::ConstantInt::get(GuardVal->getType(), 1),
+                          GuardAddr);
+
+      // Emit invariant start for TLS guard address.
       EmitInvariantStart(
           Guard.getPointer(),
           CharUnits::fromQuantity(
-              CGM.getDataLayout().getTypeAllocSize(GuardVal->getType())));
+              CGM.getDataLayout().getTypeAllocSize(GuardVal->getType())),
+          IsTLS);
----------------
rjmccall wrote:

Yes, and in fact, sometimes that's required.  The way I think about it is that `threadlocal.address` is a real dynamic operation that resolves the address of the thread-local variable for a specific thread: namely, the current thread at the execution point where `threadlocal.address` appears. When we're doing a complicated operation on a specific thread-local variable, it's important to call `threadlocal.address` at the right point and then use that result throughout the operation.  Otherwise, a suspension in the middle of the operation will leave us working on a different thread-local variable at different points in the operation, which is not semantically allowed.

In this case, we initialize a thread-local variable and then enter an invariant region for it.  We need this to:
1. resolve the address of the TLV for the current thread, prior to performing any initialization;
2. initialize the TLV, which may include suspensions that change the current thread; and
3. enter an invariant region for the TLV we initialized, *not* the TLV for the new thread.  The new thread's TLV may not yet be in an invariant region.

Now, this may actually be moot, because I'm not sure it's actually allowed (or at least I'm not sure it *should* be allowed) to have a coroutine suspension in the middle of the initialization of a thread-local variable.  The interaction of thread-locals with coroutines that actually switch threads is deeply problematic; notably, the TLV for the new thread can actually be uninitialized when you go to use it.  I haven't checked what the standard says here, but honestly this might have to have undefined behavior, in which case we just need to make sure we generate reasonable code as long as the thread *doesn't* change.

https://github.com/llvm/llvm-project/pull/96633