[Lldb-commits] [PATCH] D42210: Re-enable logging on the child side of a fork() in lldb-server platform mode
Pavel Labath via lldb-commits
lldb-commits at lists.llvm.org
Tue Feb 6 02:19:06 PST 2018
Right I see. Thanks for the explanation.
I feel this is a good time to bring up again my idea for implementing
gdb-server connections in a port-forwarder-friendly way. It's
described at <http://lists.llvm.org/pipermail/lldb-dev/2017-November/012973.html>,
so I am not going to repeat the details here, but the crux of it is
that it would allow us to connect to gdb-server without requiring the
running of any special port-forwarding programs(*), which is an issue
that periodically comes up as a problem for some one. I don't know if
you're happy with the way you are currently doing the port forwarding,
but just in case you're not, I want to tell you that there is a way to
avoid it. :)
(*) In some situations (which probably includes your iphone scenario
as well) you will still need to forward *one* port for the initial
platform connection, but that can be simpler than needing to forward
dozens of ports to enable fast parallel testing. Also, in a lot of
scenarios (like a typical NAT setup) you wouldn't need to forward any
ports at all.
On 5 February 2018 at 19:39, Jason Molenda <jmolenda at apple.com> wrote:
> Yeah I saw the TCPSocket::Accept differences on top of tree after sending the email - unfortunately I have to do my work on this old branch, and I can't build our top of tree sources for the iphone target right now. It's not a great situation!
> The min/max ports are needed for running the testsuite on a device, we run a program to relay a specific list of ports between the mac and the phone so we have to depend on that.
> I'll see if I can get the testsuite running on my local mac via lldb-server platform on top of tree and see what changes are needed there, hopefully I can repo the same problems I'm hitting on this old branch if they're still applicable.
>> On Feb 5, 2018, at 2:56 AM, Pavel Labath <labath at google.com> wrote:
>> Hi Jason,
>> Thanks for the heads up. I look forward to getting rid of fork()
>> there, but there is one thing that's not clear to me. You say that
>> TCPSocket::Accept() calls CloseListenSockets().. I see don't see
>> anything like that in the current code, and I know for a fact that we
>> are able to handle multiple (concurrent) connections, as that is how
>> we run tests on android right now. Could this be some problem with
>> the branch that you were working on?
>> In general, if you don't need to use the --min-port/--max-port
>> arguments (that is indeed broken in the fork mode), then everything
>> should already be set-up for the parallel remote testing. In fact,
>> unless you have a hard requirement to use --min/max-port then I would
>> advise against it. I think it can cause you a lot of problems down the
>> line. The issue with vending out ports like this is that the
>> "PortCoordinator" does not have control over the whole system, and it
>> cannot prevent some other process from claiming one of the ports that
>> it "owns". So, you may need to implement some retry logic to handle
>> the case when debugserver cannot bind to the port that it was assigned
>> I think that a much more reliable and simpler solution would be to
>> have debugserver ask the OS to assign it an unused port (by binding to
>> port zero), and then report that port back to lldb-platform (via the
>> --named-pipe arg -- this is how we use llgs right now). Besides fixing
>> the port allocation problem, this will also ensure that by the time
>> the platform reports the port back to the client, debugserver is
>> already primed and waiting for a connection. I'm not sure if
>> debugserver right now supports the --named-pipe argument, but it
>> should be fairly trivial to add it if needed.
>> Of course, if you need to the min/max-port mode of operation then
>> that's fine, but if this were up to me, i'd try hard to find a way to
>> avoid it. :)
>> On 3 February 2018 at 04:12, Jason Molenda <jmolenda at apple.com> wrote:
>>> Hi Pavel, I closed this phabracator, I've been working on this on-and-off and I think a different approach would be better.
>>> I am working up a patch where I change tools/lldb-server/lldb-platform.cpp from using a fork() model to using std::thread's. The main thread listens for incoming connections, and when one is received, it starts up a new std::thread to handle that connection. I have a PortCoordinator singleton object that manages the ports we can use for communications. Newly created threads request ports from (& free them when the thread is finished) so that it would be possible to run multiple tests at the same time.
>>> The current code fork()'s when it receives a connection, and gives the child process a copy of all the ports that are available. Because it's in a separate process, if we were to receive a second connection and fork off a new process, it would have the same list of ports listed as available. When debugserver is being used, this is a problem - when the lldb-server platform thread/process gets a qLaunchGDBServer packet, we run debugserver saying "hey debugserver listen for a connection on port N" and then tell the remote lldb process "there's a debugserver waiting to hear from you on port N" - so lldb-server can't test whether port N is in use or not.
>>> (there was also a problem in GDBRemoteCommunication::StartDebugserverProcess where it has a url like localhost:1111 and then it tries to run debugserver in --named-pipe mode even though we already have a port # to use in the url.)
>>> The final problem is that TCPSocket::Accept waits for an incoming connection on the assigned port #, and when it comes in it gets a new file descriptor for that connection. It should resume listening to that assigned port for the next connection that comes in ... but today TCPSocket::Accept calls CloseListenSockets() so after opening the first lldb-server platform connection that comes in, when we go to accept() the second, the socket has been closed and we exit immediately. To quickly recap the POSIX API (I never do network programming so I have to remind myself of this every time), the workflow for a server like this usually looks like
>>> parentfd = socket(AF_INET, SOCK_STREAM, 0);
>>> optval = 1;
>>> setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR,
>>> (const void *)&optval , sizeof(int));
>>> serveraddr.sin_family = AF_INET;
>>> serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
>>> serveraddr.sin_port = htons((unsigned short)portno);
>>> bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr);
>>> listen(parentfd, 100); // allow 100 connections to queue up -- whatever
>>> childfd = accept(parentfd, (struct sockaddr *) &clientaddr, &clientlen);
>>> and then we go back to accept()'ing on parentfd after we've spun off something to talk to childfd.
>>> I've been doing all of my work on an old branch for reasons that are boring, so some/all of this may be fixed on top of tree already! But this is what I had to do to get my branch to work correctly on a remote system.
>>> I also noticed that dotest.py won't run multiple debug sessions with a remote target. That was part of my goal, to speed these up, but it seems to run in --no-multiprocess mode currently.
>>> I'll be picking this up next week - my changes right now are a mess, and they're against this old branch that I have to work on, but I'll get them cleaned up & updated to top of tree and make up a patchset. But I wanted to give you a heads up on where I'm headed as this touches a lot of your code.
>>>> On Jan 18, 2018, at 3:44 AM, Pavel Labath via Phabricator <reviews at reviews.llvm.org> wrote:
>>>> labath added a comment.
>>>> In https://reviews.llvm.org/D42210#979608, @jasonmolenda wrote:
>>>>> Jim remembers some problem with logging across a fork()'ed process, Pavel does this ring a bell? This change might be completely bogus -- but it's very difficult to debug the child side of an lldb-server platform today.
>>>> I believe Jim is thinking of https://reviews.llvm.org/D38938. The issue is that if another thread holds the logging mutex while we fork(), the mutex will remain in a bogus state in the child process. This would mean that any operation on the Log subsystem (such as enabling logging) could hang. We hold the mutex for just a couple of instructions (just enough to copy a shared_ptr), but after some code reshuffles, we were hitting this case very often in liblldb.
>>>> Now, I don't think this can happen here as at the point where we are forking, the platform process is single-threaded. So, enabling logging here should be safe, but only accidentally. It should be possible to make this always safe, but that will need to be done with a very steady hand. My opinion is we should not bother with that (i.e., just check this in as-is) until we actually need more threads in the platform, particularly as I think the long-term direction here should be to replace the fork with a new thread for handling the connection.
>>>> As for testing, the problem we have now is that we have no direct platform tests today -- remote tests will test platform by the virtue of all connections going through it but a standard check-lldb will not even run this code. I have been planning to add tests like these, but I haven't got around to that yet, and I'm not sure when will that happen. If you want to take a stab at it, I can give you some pointers.
>>>> BTW: for most debugging needs you should be able to just start the platform without the --server argument, which will disable forking and handle each connection in the main process.
>>>> rL LLVM
More information about the lldb-commits