[Lldb-commits] [PATCH] D75711: [NFC] Have ThreadPlans hold onto the Process & TID, rather than the Thread

Mon Mar 9 10:46:54 PDT 2020

jingham added a comment.

In D75711#1912230 <https://reviews.llvm.org/D75711#1912230>, @labath wrote:

> Thanks for the detailed response. I'll try to be equally understandable.
>
> In D75711#1910155 <https://reviews.llvm.org/D75711#1910155>, @jingham wrote:
>
> > In D75711#1909055 <https://reviews.llvm.org/D75711#1909055>, @labath wrote:
> >
> > > I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".
> >
> >
> > First off, I also think that hiding some threads from users, or forcing people to say "thread list --force" or something like that to see them all would not be a user-friendly design.  I'm not suggesting imposing that on general debugging.  But if an OS plugin, for instance, decides it is too expensive to do this except as an explicit gesture, we need to support that decision.
> >
> > But lldb stops many times during the course of a step where it doesn't return control to the user - all the step into code with no debug info, step out again, step across dynamic linker stubs or step through ObjC message dispatch dances we do involve many private stops per public spot...  If a thread has not stopped "for a reason" at one of those stops, it takes no part in the "should stop" and "will resume" negotiations and reconstructing it for that stop really does serve no purpose.  There's no way that the user will get a chance to see it.
> >
> > So if we could avoid reading them in, that might make stepping go faster in the presence of lots of threads.  BTW, with the accelerated stop packets where we get all the threads and their stop reasons on stop, I'm not sure this would really save all that much time.  But talking to less capable stubs, it really can make a difference.  Greg did a bunch of work to make sure stepping stopped as few times as possible because the Facebook gdb-remote stub didn't support these accelerated packets and getting all the thread info on each private stop was making stepping go really slowly.  I didn't understand the motivation at first because for debugserver the changes didn't really make any difference.
> >
> > Also, I don't think these two desires: not fetching threads when we don't need them (e.g. private stops) and showing them all to the user whenever they ask, are contradictory.  After all we still have the gate of UpdateThreadListIfNeeded.  All the commands that touch the thread list go through this. So even if we don't gather all the threads when we stop, we already have a mechanism in place whereby any command that actually touches the thread list can fetch them on demand.
>
>
> Yes, I agree with everything above. If lldb does not need information about all threads in order to service a internal stop, then it shouldn't read it in, and the os plugin (or the remote stub) should not need to provide it. In fact I would even go so far as to say that we shouldn't need to read in this information for a public stop until a user performs an operation (e.g. `thread list`) that requires information about all threads.
>
> > 
> > 
> >> What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?
> >> 
> >> If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.
> >> 
> >> If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).
> > 
> > The thing that is slow is gathering all the threads.  On Darwin, there's a kernel activation per user space thread and then a whole bunch of kernel threads.  On a busy system, that ends up being a whole lot of threads.  Producing them through the OS plugin interface involves a bunch of memory reads to fetch the thread data, and following pointers around to get the register state for the threads, etc.
> > 
> > Note also, the way the OS Plugins currently work, when the process stops the OS Plugin just gets asked to produce the threads backing the "on core" threads.  That's all UpdateThreadListIfNeeded will fetch, and there's no lldb way to get the OS Plugin to fetch all the threads it can produce.  The only way to do that with the Darwin kernel is to run other commands in the set of Python commands provided by the kernel team that either fetches a non-current thread by ID or forces fetching all the threads.
> > 
> > So the current state when working with OS Plugins violates your principle given above,
>
> IIUC, this principle is only violated for this particular os plugin, which violates the plugin contract and does not return all threads through the `get_thread_info` interface. I don't know whether this contract is spelled out anywhere, but it seems like at least the code in lldb behaves that way.
>
> So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

I think when presenting an OS's threads through the funnel of a monitor stub that only knows about cores and not about the OS's concept of threads, you will very likely run into the problem where reconstructing all the threads at every stop is expensive and not particularly desirable.  Two of the three instances of OS plugins that I know about do it this way.  The other is for wrapping a cooperative multi-threading library, where fetching all the threads was pretty easy and there weren't too many.  But for instance if getting all the threads really is slow, for the OS plugin to be useful we need to allow it to do partial reporting to be useful.

Anyway, I don't know that we need anything more added to the OS interface.  There's a "make a thread for a TID" interface, which we can use to reap the persistent part of the thread, for instance when we make a public stop.  If that's fast enough (I haven't gotten to that point with the xnu plugin) we can us that.  We can add an API to query whether the plugin returns all threads, and gate our behavior on that rather than on "Has OS Plugin".  But that's about all I can think of.  I also have to support all the OS plugins in all the xnu dSYM's that are currently in the wild, too.  So whatever we do has to work without requiring added behaviors.

> That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..
> 
>> but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful.  But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.
>> 
>> In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object.  That is the sole ambition of this set of patches.
> 
> We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).
> 
> I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing `weak_ptr<Thread>` (but with the downside mentioned in (*)).

Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP).  That probably hasn't been pushed hard, and I've only written a few tests with actually vanishing threads so far.  But if it doesn't work properly then that's a bug to fix not a design to change.

ValueObjects (and therefore SBValues) use an ExecutionContextRef in their UpdatePoint, they don't hold onto a thread.  SBThread's also hold ExecutionContextRef's not ThreadWP's to get back to their actual thread.  In general, anything that wants to know "where am I in the state space of the process" is supposed to use an ExecutionContextRef to store its state not individual target/process/thread/etc.  So I'm not sure this is so far off, but we would have to write some more tests with threads coming and going to make sure this is used consistently.

> (*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a `Thread*` already is.

Not sure I understand this.  If I run the process and a real thread exits and another one is created with the same TID, all before stopping, when I go to merge the thread lists on stop I'm going to think this different thread was the same thread because it had the same TID.  I don't see how using Thread pointers rather than TID's helps with this.  I still have to trust the TID I got from the monitor.  We are always going to have a problem merging up Threads when TID's can be reused.  I don't see how our Thread *'s - which don't represent anything in the target, and thus don't help with reasoning about persistence, help this at all.

For instance, on MacOS user space the TID we use is the "globally unique thread ID" so it never recurs.  The xnu OS plugin has a unique identifier as well.  Similarly, if this becomes a problem on Linux, IIRC you observe thread creation and deletion so you can do pruning at that point.  But this reasoning is all about TID's not about Thread *'s.  So I still think that TID's are a better identifier for our representation of threads than the Thread * that we happen to wrap them in.  Note also, when OS Plugin's are around, the mapping from Thread * to TID is not quite so straightforward, and one of the things I like about moving the persistent parts of the Thread (which for now are only the ThreadPlanStack and the TID) out of the whole thread production logic is that it gets us out of knowing anything about this representation, and in the end we can ask the question we really care about: "Is the TID still here".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75711/new/

https://reviews.llvm.org/D75711