[all-commits] [llvm/llvm-project] 6977bf: [mlir] Fix race condition introduced in ThreadLoca...

Jeff Niu via All-commits all-commits at lists.llvm.org
Fri May 24 02:06:56 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6977bfb57c3efb9488aef463cd7ea521fd25a067
      https://github.com/llvm/llvm-project/commit/6977bfb57c3efb9488aef463cd7ea521fd25a067
  Author: Jeff Niu <jeff at modular.com>
  Date:   2024-05-24 (Fri, 24 May 2024)

  Changed paths:
    M mlir/include/mlir/Support/ThreadLocalCache.h

  Log Message:
  -----------
  [mlir] Fix race condition introduced in ThreadLocalCache (#93280)

Okay, so an apparently not-so-rare crash could occur if the
`perInstanceState` point got re-allocated to the same pointer as before.
All the values have been destroyed, so the TLC is left with dangling
pointers, but then the same `ValueT *` is pulled out of the TLC and
dereferenced, leading to a crash.

I suppose the purpose of the `weak_ptr` was that it would get reset to a
default state when the `perInstanceState` shared pointer got destryoed
(its reference count would only ever be 1, very briefly 2 when it gets
aliased to a `ValueT *` but then assigned to a `weak_ptr`).

Basically, there are circular references between TLC instances and
`perInstanceState` instances and we have to ensure there are no dangling
references.

1. Ensure the TLC entries are reset to a valid default state if the TLC
(i.e. owning thread) lives longer than the `perInstanceState`. a. This
is currently achieved by storing `weak_ptr` in the TLC.
2. If `perInstanceState` lives longer than the TLC, it cannot contain
dangling references to entries in destroyed TLCs. a. This is not
currently the case.
3. If both are being destroyed at the same time, we cannot race. a. The
destructors are synchronized because the TLC destructor locks `weak_ptr`
while it is destructing, preventing the owning `perInstanceState` of the
entry from destructing. If `perInstanceState` got destructed first, the
`weak_ptr` lock would fail.

And

4. Ensure `get` in the common (initialized) case is as fast as possible
(no atomics).

We need to change the TLC to store a `ValueT **` so that it can be
shared with entries owned by `perInstanceState` and written to null when
they are destroyed. However, this is no longer something synchronized by
an atomic, meaning that (2) becomes a problem. This is fine because when
TLC destructs, we remove the entries from `perInstanceState` that could
reference the TLC entries.

This patch shows the same perf gain as before but hopefully without the
bug.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list