[Lldb-commits] [PATCH] D75711: [NFC] Have ThreadPlans hold onto the Process & TID, rather than the Thread

Pavel Labath via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Mon Mar 9 05:20:35 PDT 2020


labath added a comment.

Thanks for the detailed response. I'll try to be equally understandable.

In D75711#1910155 <https://reviews.llvm.org/D75711#1910155>, @jingham wrote:

> In D75711#1909055 <https://reviews.llvm.org/D75711#1909055>, @labath wrote:
>
> > I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".
>
>
> First off, I also think that hiding some threads from users, or forcing people to say "thread list --force" or something like that to see them all would not be a user-friendly design.  I'm not suggesting imposing that on general debugging.  But if an OS plugin, for instance, decides it is too expensive to do this except as an explicit gesture, we need to support that decision.
>
> But lldb stops many times during the course of a step where it doesn't return control to the user - all the step into code with no debug info, step out again, step across dynamic linker stubs or step through ObjC message dispatch dances we do involve many private stops per public spot...  If a thread has not stopped "for a reason" at one of those stops, it takes no part in the "should stop" and "will resume" negotiations and reconstructing it for that stop really does serve no purpose.  There's no way that the user will get a chance to see it.
>
> So if we could avoid reading them in, that might make stepping go faster in the presence of lots of threads.  BTW, with the accelerated stop packets where we get all the threads and their stop reasons on stop, I'm not sure this would really save all that much time.  But talking to less capable stubs, it really can make a difference.  Greg did a bunch of work to make sure stepping stopped as few times as possible because the Facebook gdb-remote stub didn't support these accelerated packets and getting all the thread info on each private stop was making stepping go really slowly.  I didn't understand the motivation at first because for debugserver the changes didn't really make any difference.
>
> Also, I don't think these two desires: not fetching threads when we don't need them (e.g. private stops) and showing them all to the user whenever they ask, are contradictory.  After all we still have the gate of UpdateThreadListIfNeeded.  All the commands that touch the thread list go through this. So even if we don't gather all the threads when we stop, we already have a mechanism in place whereby any command that actually touches the thread list can fetch them on demand.


Yes, I agree with everything above. If lldb does not need information about all threads in order to service a internal stop, then it shouldn't read it in, and the os plugin (or the remote stub) should not need to provide it. In fact I would even go so far as to say that we shouldn't need to read in this information for a public stop until a user performs an operation (e.g. `thread list`) that requires information about all threads.

> 
> 
>> What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?
>> 
>> If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.
>> 
>> If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).
> 
> The thing that is slow is gathering all the threads.  On Darwin, there's a kernel activation per user space thread and then a whole bunch of kernel threads.  On a busy system, that ends up being a whole lot of threads.  Producing them through the OS plugin interface involves a bunch of memory reads to fetch the thread data, and following pointers around to get the register state for the threads, etc.
> 
> Note also, the way the OS Plugins currently work, when the process stops the OS Plugin just gets asked to produce the threads backing the "on core" threads.  That's all UpdateThreadListIfNeeded will fetch, and there's no lldb way to get the OS Plugin to fetch all the threads it can produce.  The only way to do that with the Darwin kernel is to run other commands in the set of Python commands provided by the kernel team that either fetches a non-current thread by ID or forces fetching all the threads.
> 
> So the current state when working with OS Plugins violates your principle given above,

IIUC, this principle is only violated for this particular os plugin, which violates the plugin contract and does not return all threads through the `get_thread_info` interface. I don't know whether this contract is spelled out anywhere, but it seems like at least the code in lldb behaves that way.

So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..

> but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful.  But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.
> 
> In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object.  That is the sole ambition of this set of patches.

We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).

I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing `weak_ptr<Thread>` (but with the downside mentioned in (*)).

(*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a `Thread*` already is.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75711/new/

https://reviews.llvm.org/D75711





More information about the lldb-commits mailing list