[Lldb-commits] Performing ManualDWARFIndex::Index() async during ManualDWARFIndex::Preload()

Sun Jan 31 05:21:34 PST 2021

On 29/01/2021 16:25, Lasse Folger via lldb-commits wrote:
 > Hi lldb devs,
 >
 > We experienced some issues when debugging large binaries.
 > The startup time of lldb for large binaries is long (tens of seconds).
 > One of the reasons is that the /ManualDWARFIndex::Index()/ function is
 > executed synchronously and thus blocks during startup.

Before I go further, let me ask this: You are aware of the 
'target.preload-symbols' setting which can disable (delay) the indexing 
at starting, right? I am asking because it may result in a similar 
behavior than what you get with your patch. At the very least, it would 
be interesting to see a comparison of the two scenarios, to get a feel 
for the benefits it offers.

On 29/01/2021 18:23, Lasse Folger via lldb-commits wrote:
> Thanks Raphael for the pointers, I will take a look.
> 
> I just noticed I forgot to attach the actual patch I prepared ;)
> Here it is.
> 
> On Fri, Jan 29, 2021, 5:02 PM Raphael “Teemperor” Isemann 
> <teemperor at gmail.com <mailto:teemperor at gmail.com>> wrote:
> 
>     I can't help with any of the requested feedback, but the last time I
>     looked at a benchmark involving ManualDWARFIndex, the
>     ManualDWARFIndex::IndexUnit spent about 40% of its runtime just
>     constructing ConstString. Half of that 40% (so, 20% total time) was
>     just spend contesting/locking the locks for ConstString's global
>     string pools. Refactoring the code to not having to reaquire a
>     contested global lock for every DIE might help with improving
>     startup time.

I also feel obligated to point out that this code has been optimized and 
re-optimized by a number of people already. So all of the quick wins are 
likely gone already.

I am not saying that its not possible to improve the performance of this 
(in fact, I'm certain it can) -- however, such an effort will probably 
require making fundamental changes to the algorithm, rather than just 
using fancier data structures.

I don't want to sound too discouraging, but I think you ought to know 
what you're getting yourself into.

Good luck! (I mean it)

>>     I would like to have feedback on three parts:
>>
>>      1. In /ManualDWARFIndex::Preload() /I capture /this/ of the
>>         /ManualDWARFIndex/. As far as I could tell this is not a
>>         problem since the index is not relocated or deleted. However,
>>         I'm not that familiar with the architecture. Are there cases
>>         when the object is relocated or deleted before lldb stops?
While it's very unlikely, I don't think we can guarantee that the Module 
object holding the index is not deleted by the time the async thread 
runs. Such a thing might happen due to events totally beyond our 
control, such as the user calling SBTarget::RemoveModule.

This should be fixable by simply stashing the Module shared pointer.

>>      2. I use the /Module /mutex for synchronization because I didn't
>>         want to introduce a mutex in /ManualDWARFIndex /or
>>         /DWARFIndex/. Do you think this locking granularity is fine or
>>         should I create a dedicated mutex for it?
I think it should work, but a separate synchronization primitive might 
be cleaner. I might even go for a once_flag/call_once to also get the 
"runs once" behavior for free.

>>         Furthermore, I
>>         looked through the implementation
>>         of ManualDWARFIndex::/Index()/ and couldn't find any locking
>>         function in there. However, I might have missed something (or
>>         some of the underlying implementations change in the future)
>>         and there is a danger for a deadlock.
The lack of locking is not surprising because these functions already 
expect that the caller has taken the Module lock.

>>      3. /ManualDWARFIndex::Index()/ creates a thread pool internally.
>>         This means if multiple calls to/ ManualDWARFIndex::Index()
>>         /are executed at the same time, there might be more threads
>>         than cores. This might cause contention and decrease
>>         efficiency. (Thanks to Andy for pointing this out).

This is an interesting question for me, but I don't really have a good 
answer to it. On one hand, you could say that this is a situation that 
could occur already (though it wasn't always the case -- back when we 
had our own thread pool, it was process-wide and shared between 
different module instances). But OTOH, this does make it much easier for 
this situation to occur...

pl