[Lldb-commits] [lldb] [lldb] Multithreading lldb-server works on Windows now; fixed gdb port mapping (PR #100670)

Mon Jul 29 03:08:43 PDT 2024

labath wrote:

> @labath
> 
> > I wanted to change this code to use threads for a long time, but the idea of using a per-thread virtual file system absolutely terrifies me. That only works is all of the accesses go through the file system (no direct syscalls) and if we're very strict about which code executes on which thread. Right now, I don't think we're either, and I don't see how we could ever achieve it. (That's not exactly true for the code in the lldb-server, but we also don't really have enforcable rules for which code can be run in the lldb-server, and the code this is changing affects all of lldb).
> 
> We need the virtual working directory only for few requests via the platform protocol. It is necessary to just resolve the path. Note I have added only 2 updates for that. All the rest code already uses lldb::FileSystem correctly. 

I believe that the code that's necessary to run `lldb-server platform` uses the filesystem correctly. I trust you've made sure of that by running the test suite. However, I don't think you can claim that for every usage of the FileSystem class in everywhere in lldb, because for most of those you can't even tell whether they'd want to use a "thread-local" filesystem or a global one.

I certainly can't do it, and that's kind of my point: I think this is a bad abstraction. Outside of a very limited use case, it's impossible to reason about this and/or prove that introducing a thread-local cwd is safe and correct. We're often optimizing code by moving it from one thread to another, or farming it out to a thread pool, and we also have code that can run on several threads in different contexts. All of these things could cause "correct" code  to suddenly stop working because a completely unrelated code has been changed to run on a different thread.

If this is only really used for a "only for few requests via the platform protocol", then why not make the CWD a property of the platform object? (Either through a virtual filesystem, or just by having it as a string, and resolving things explicitly)

> Once a process (`lldb-server gdbserver` or test app) is started, its process uses its own working directory.

Ack.

> 
> > (Also, I guess this is the reason for the ETXTBSY workaround)
> 
> I don't see the connection. See more details in #100659 discussion.
> 
> > to replace the fork with spawning a new process (fork+execve) instead. Have you considered that?
> 
> To fix all gdb port mapping issues we need a common port mapping available in all platform connection handlers. It is possible only with the multithreading.

I think that using threads for the management of a common port resource is a very elegant solution. I did not realize you're trying to solve that, for which I apologise. That said, I think it's a stretch to say this is the only way to solve this issue. I can think at least four other potential solutions (with different tradeoffs) right now. This is my favourite:

**Don't use port mappings** (at least in the multi-process `--server` mode). AFAICT, the only reason it exists is because lldb uses an FTP-like connection that's incompatible with firewals/NATs/etc. This means that every user has to first discover this problem, then learn about the port map flags to lldb, and then to configure their firewall to let these ports through. We get questions about this regularly. Everything would be a lot simpler if everything went through a single port.

The way this would work is by letting the platform instance delegate/upgrate/convert the platform connection into a gdbserver one. The way this would work would be something like this:
1. `lldb-server platform` would advertise (say in `qSupported`) its support for this new mode.
2. Before asking the platform to launch a new gdb server, lldb would query this feature. If present, instead of the usual action (`qLaunchGDBServer`), it would create *another* platform connection, using the same port as the original one. As we're using the same port, we'd go through all the nats just like the original connection.
3. On this new connection it would send a new special packet (let's call it `qUpgradeToGdbConnection`)
4. `lldb server platform` would launch an gdbserver instance and everything else would proceed as before.

On non-darwin platform (darwin uses `debugserver`) we could optimize to avoid spawning a new process, and just call the relevant gdb-server code directly. This might be nice cause then we could use `execve` (unsupported on windows) for calling the debugserver on darwin, and all other platforms would work the same way as windows. But this is still just an optimization, and passing the socket through fork+exec/CreateProcess could still work everywhere.

> And I don't see a way to delegate the accepted incoming connection to a spawned process on Windows.

Though I'm not a windows expert, I'm pretty sure that's possible. Windows has the concept of inheritable `HANDLE`s and https://learn.microsoft.com/en-us/windows/win32/sysinfo/handle-inheritance explicitly lists sockets as one of the inheritable handle types.

> 
> > One way I can imagine this happening is if the FileSystem instance was local to a GDBRemoteCommunicationServerPlatform instance -- rather than the thread it happens to be (mostly) running on. This will require more changes to, basically, plumb the filesystem instance to every place that needs to be used from the platform object, but I think that's actually a good thing. It will give us a record of which code is prepared to deal with virtualized filesystems and which one isn't. I just don't think we can assume that all our code (present and future) will handle the per-thread filesystem situation correctly.
> 
> GDBRemoteCommunicationServerPlatform extends GDBRemoteCommunicationServerCommon, GDBRemoteCommunication, etc. The working directory may be used for own tasks (for example load/save settings) and to handle request with a relative path.

That may be the case, and if that's true, then I think the base class should have a filesystem parameter as well. Either way, we're imposing some sort of a requirement on the code around us. In one case it's "a path string needs to be resolved relative the a certain filesystem", in the other "a path string need to be resolved on a specific thread". I think the first one is better because it's more explicit -- you know which code is prepared to handle "alternate" filesystems by seeing if it accepts a filesystem argument -- and flexible -- you can pass paths across threads by making sure the filesystem travels along with it.

> > (That said, it still may be better to just spawn a new process instead. I don't think this is a particularly hot code path (I think the test suite is basically the only user of multithreaded platforms), and it will make sure we don't hit the ETXTBSY issue).
> 
> How will the spawned process help? I think the only way to fix ETXTBSY issue is to copy the executable to the target and launch it from the same process. It seems MacOS uses system's gdbserver instead of `lldb-server gdbserver`. Please correct me if I'm wrong. So ETXTBSY issue is the design issue. Probably it is necessary to move vFile:open, vFile:pwrite, vFile:close to gdbserver somehow to fix ETXTBSY issue.

That would certainly help, but I don't think its necessary. I've explained what I think the problem is on the other PR.

https://github.com/llvm/llvm-project/pull/100670