[PATCH] D129089: [BOLT] Fix concurrent hash table modification in the instrumentation runtime
MichaĆ Chojnowski via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 2 17:38:20 PST 2023
michoecho added a comment.
> Also it looks like since visitIndCallCounter is called by forEachElement with mutex lock the CallFlowTable->get() would never be able to lock the mutex
But these are two different mutexes, aren't they? The lock taken by the `forEachElement` surrounding `visitIndCallCounter` is in `GlobalIndCallCounters[I]`. The lock in `CallFlowTable->get()` sits in `Ctx.CallFlowTable`. Unless you mean some other locks?
> I observe that one of the random GlobalIndCallCounters might become locked and when it comes to the writeIndirectCallProfile->forEachElement it becomes deadlocked.
It seems that `GlobalIndCallCounters` are only locked in `__bolt_instr_data_dump --> writeIndirectCallProfile --> forEachElement`, in `__bolt_instr_clear_counters --> resetCounters`, and in `__bolt_instr_indirect_{tail}call --> instrumentIndirectCall --> incrementVal --> get`. So they shouldn't lock each other out. Unless... `__bolt_instr_data_dump` is called from a signal handler.
I see that `Lock` takes care to mask signals while it's locked, but `TryLock` doesn't. So if a signal comes while a `TryLock` is held in `__bolt_instr_indirect_{tail}call --> instrumentIndirectCall --> incrementVal --> get`, and the signal handler calls `__bolt_instr_data_dump`, then `__bolt_instr_data_dump` will get deadlocked, just as you describe.
So, are your deadlocks happening in a signal handler? If yes, then case closed. (But TryLock should be fixed regardless of the answer.)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129089/new/
https://reviews.llvm.org/D129089
More information about the llvm-commits
mailing list