[Lldb-commits] [PATCH] D75711: [NFC] Have ThreadPlans hold onto the Process & TID, rather than the Thread

Jim Ingham via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Mar 10 10:53:11 PDT 2020


jingham added a comment.

In D75711#1914001 <https://reviews.llvm.org/D75711#1914001>, @labath wrote:

> In D75711#1912902 <https://reviews.llvm.org/D75711#1912902>, @jingham wrote:
>
> > In D75711#1912230 <https://reviews.llvm.org/D75711#1912230>, @labath wrote:
> >
> > > So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).
> >
> >
> > I think when presenting an OS's threads through the funnel of a monitor stub that only knows about cores and not about the OS's concept of threads, you will very likely run into the problem where reconstructing all the threads at every stop is expensive and not particularly desirable.  Two of the three instances of OS plugins that I know about do it this way.  The other is for wrapping a cooperative multi-threading library, where fetching all the threads was pretty easy and there weren't too many.  But for instance if getting all the threads really is slow, for the OS plugin to be useful we need to allow it to do partial reporting to be useful.
> >
> > Anyway, I don't know that we need anything more added to the OS interface.  There's a "make a thread for a TID" interface, which we can use to reap the persistent part of the thread, for instance when we make a public stop.  If that's fast enough (I haven't gotten to that point with the xnu plugin) we can us that.  We can add an API to query whether the plugin returns all threads, and gate our behavior on that rather than on "Has OS Plugin".  But that's about all I can think of.
>
>
> Yes, that's sort of what I was expecting. Another idea I had was to have a way to avoid reading the register context initially, and only have it be fetched on demand.
>
> > I also have to support all the OS plugins in all the xnu dSYM's that are currently in the wild, too.  So whatever we do has to work without requiring added behaviors.
>
> Yep, anything we do shouldn't break existing plugins, but I think its fair to require some modification to an existing plugin in order to enable a new feature (such as stepping across thread reschedules).
>
> > 
> > 
> >> That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..
> >> 
> >>> but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful.  But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.
> >>> 
> >>> In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object.  That is the sole ambition of this set of patches.
> >> 
> >> We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).
> >> 
> >> I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing `weak_ptr<Thread>` (but with the downside mentioned in (*)).
> > 
> > Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP).  That probably hasn't been pushed hard, and I've only written a few tests with actually vanishing threads so far.  But if it doesn't work properly then that's a bug to fix not a design to change.
> > 
> > ValueObjects (and therefore SBValues) use an ExecutionContextRef in their UpdatePoint, they don't hold onto a thread.  SBThread's also hold ExecutionContextRef's not ThreadWP's to get back to their actual thread.  In general, anything that wants to know "where am I in the state space of the process" is supposed to use an ExecutionContextRef to store its state not individual target/process/thread/etc.  So I'm not sure this is so far off, but we would have to write some more tests with threads coming and going to make sure this is used consistently.
>
> This sounds very convincing to me. I was under the impression that this is kind of inventing something new, but as there already is a notion of floating Thread objects, then I don't have a problem with that. I'd say the existing ExecutionContextRef mechanism is working fairly well, as I haven't yet noticed how it re-fetches threads.


To be fair, I think in the current state of things, this code is probably seldom used, since we track threads accurately and you would very seldom have a TID the process knows about that doesn't have a backing thread by the time you get to the level where people were using ExecutionContextRef's.  So it's possible this doesn't work perfectly when threads do vanish - though I haven't seen this in any of my hand-testing.  But it shows that when we were thinking about how to specify threads we made the same choice we did with frames, and specify them by PC/CFA for frames and TID for threads rather than the native lldb objects.

> 
> 
>> 
>> 
>>> (*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a `Thread*` already is.
>> 
>> Not sure I understand this.  If I run the process and a real thread exits and another one is created with the same TID, all before stopping, when I go to merge the thread lists on stop I'm going to think this different thread was the same thread because it had the same TID.  I don't see how using Thread pointers rather than TID's helps with this.  I still have to trust the TID I got from the monitor.  We are always going to have a problem merging up Threads when TID's can be reused.  I don't see how our Thread *'s - which don't represent anything in the target, and thus don't help with reasoning about persistence, help this at all.
> 
> a Thread* helps precisely because it "doesn't represent anything in the target". Since tid_ts are supposed to match what the OS notion of a thread identifier is, and these are not unique across time on all systems, we'd need to come up with some surrogate identifier to be able to tell them apart. A pointer one kind of a unique identifier.
> 
> However, you are right that this alone would not help us maintain accurate thread identity because the tid could come and go without us noticing (though it would help if we do observe the thread to disappear). Further changes would be needed to make that work (and it's not clear whether they are actually worth it). I was mainly trying to ensure that any new concept we come up with does not get in the way if that becomes necessary. But as we are not inventing a new concept here, this point is moot.

Yes, I think we're okay there.

Even if we get don't notice that a TID has changed its underlying Thread object, the thread plans all get asked "Are you stale" as part of the should-stop mechanism.  For instance a "step-out" plan knows that it is stepping out to some Frame, and if that frame isn't on the stack it ditches itself.  Similarly step plans expect to see the current stepping frame..  So if you did get it wrong and attach the Thread Plans from one underlying thread to another one which adopted its TID, they would all get discarded when applied to a thread that didn't make sense.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75711/new/

https://reviews.llvm.org/D75711





More information about the lldb-commits mailing list