<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/118032>118032</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[lldb][tests] Sockets leaks in API tests with a remote target
</td>
</tr>
<tr>
<th>Labels</th>
<td>
lldb
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
slydiman
</td>
</tr>
</table>
<pre>
We got unexpected errors on a random single test on [lldb-remote-linux-ubuntu](https://lab.llvm.org/buildbot/#/builders/195) and [lldb-remote-linux-win](https://lab.llvm.org/staging/#/builders/197) 1-4 times per day.
Error 1: Unresolved some test with the exception `failed to create a socket to the launched debug monitor after 20 tries`.
Usually we got this error on the Linux host (lldb-remote-linux-ubuntu), e.g. [TestGdbRemoteMemoryAllocation.py](https://lab.llvm.org/buildbot/#/builders/195/builds/1660), [TestNonStop.py](https://lab.llvm.org/buildbot/#/builders/195/builds/1625), [TestGdbRemoteSingleStep.py](https://lab.llvm.org/buildbot/#/builders/195/builds/1614). But we saw the same error (very rarely) on Windows host (lldb-remote-linux-win) too: [TestGdbRemoteHostInfo.py](https://lab.llvm.org/staging/#/builders/197/builds/732).
Error 2: 600 seconds timeout.
Usually (99%) we got this error on the Windows host (lldb-remote-linux-win) with the test [TestModuleLoadedNotifys.py](https://lab.llvm.org/staging/#/builders/197/builds/890) and less often with any other test, e.g. [TestLldbGdbServer.py](https://lab.llvm.org/staging/#/builders/197/builds/744). We also saw the same error (very rarely) on Linux host (lldb-remote-linux-ubuntu) too: [TestCancelAttach.py](https://lab.llvm.org/buildbot/#/builders/195/builds/1402).
I believe that the cause of both issues is the same - leaking sockets.
Error 1 is raised in connect_to_debug_monitor() in gdbremote_testcase.py.
It uses a random port `12000 + random.randint(0, 3999)` to launch a new instance of `lldb-server gdbserver *:port` on the target.
Then it tries to connect to the lldb-server up to 10 times with 0.5 sec delay and terminates the lldb-server if connection failed.
Then it tries another port up to 20 times with a random delay 1-5 seconds `to avoid collisions`.
We checked netstat during the tests in the beginning and got 164 connections in the state TIME_WAIT between the host and the target:
24 connections to `target IP`:1234 (platform)
100 connections to `target IP`:43107 (gdbserver)
40 connections to `target IP` with a random port
and 2 connections in the state ESTABLISHED.
We checked netstat during the tests after 15 minutes and got 641 connections in the state TIME_WAIT between the host and the target
310 connections to `target IP`:1234 (platform)
331 connections to `target IP`:43107 (gdbserver)
and 9 connections in the state ESTABLISHED.
Both buildbots run tests in 8 threads.
Both buildbots use python 3.12. Note the results with python 3.13 are worse probably due to an incremental GC. The average build/test time with python 3.13 is longer.
Probably increasing `MAX_ATTEMPTS = 20` in connect_to_debug_monitor() may be enough to fix the error 1.
But I have no idea how to fix and even debug the error 2. It is very hard to reproduce.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V11v2zoS_TX0yyACRcmO9eAHpUl6DTRFsfZF9i2gxLHELU0aJGXX_34xlO0k_djmLtInWxQ5M2fmzBxKhqA7i7hg0xs2vZ3IIfbOL4I5Kr2VdtI4dVw8InQuwmDx2w7biArQe-cDOAsSvLTKbSFo2xmEiCHSOpveGKOaK49bF_HKaDt8uxqawcaBTW-ZmPcx7gIraibumbg3ssmM2W8z5zsm7ptBG9W4mF4W5wX0gYn7vJoyUYG06udODtq-wUOIstO2-7mDa3KQX5UQ9RYD7NCDkseM8Zrx-o6wQ86KGv62HoMze1QQ3PYE_qBjD7FHwG8t7qKmZMz4RmqDCqKD1qOMCBKCa79ipCXabeRg2x4VKGyGDrbO6ug8yE1ED4JD9BoDm3GK4u8wSGOOcBgLE3sdxpJQ5snYJ8oE9C5EYGL-y0KIiokPgFmXUSrXGOJH1fwrbXzArfPH2hjXSsKQ7Y7vUbdxIT3OZvwUwMn3Z2dX0e3-gCcxfe3pgnKVSLuK-Ce85iUTVQY3Q6RCBXlIpQlyi6diMTHfoz-Clx7NkTjnLDxqq9wh_I_iEcFFBdE54uD3kP5yIS7txr0N0O_a4BnPdSEIzsseEOR_xjkEbJ1VIbWLG-JLijIxryqWCvBrur4Z86W3UqedoD84NRj85KRC9dlFvTmG9wY_r_h55hgMAdwmoh2jkfYILvboU0zf99Mno5qPqlmh36N_95KUI8MeEaQJ7s0Ue_N0-I5jH6Rt0dQxyrZ__34p-QuCLaFBo3GPEHsZE6pWDgHBbaBxsQcdwoABdHhGfAUG5Vdtu9NkDa8HNu31UgdUoC20zlps41N0T2ngPp0GLhNzAq4tdKoZs_JEhW1lwGyXNGAZYQgYnqVv53ykGZ8LzjkwcXN6kdGPtpGJOSdiFBW1QsVmnIb-OPBBgsUDaBsiZZfwsRlPNQmJNBTH6R8TNStq8kYmTr0Tpe8wtdy6Rws6jkqRlGbEeFGYF0aHHa3m_KRwick8m1Ing0Ijj4nrEf1WWxkx_GBAb87mSeBGdfsxCGnH1kgZGn2KVz4vORyd5lfTyzBhMx4dyL3TClpnjA7a2bMCMl4_IrQ9tl9RgcUYooygBk_lPw-IQGWkhwY7bS29IlQ0g_JZ-SL-y0aygrBePtw9PdbLNTQYD4jju9QvKSuXrBPteS1em4ouhZ42wPILm3FW1LkoSmq2nZFx4_yWaMDrnPPfHy2LnF_T2QsRxsPl785-l-HEG14TAvFr7HerdX3zabn66-72H-R5vKXkU9hqO8RU-DHPszJ_jzzzusjfkKqfZ7ko8v8_yxRI9U_SdUPT6Tz3AvjBPnNxDrH3KFX4-V6ab7tj7J2FIstFBp9dxOTMYxhMPPXM85YCpEc4OE8HvWtkY46gBiSI0oK2rcct2igNfPyQwbpHkHv0ssPRKxP3SUipIX-0rQMYZzv0p2i_nD0ku5Iu_JTIh_rfT_V6fffwZb0CVtyC4ES-38_YrTxCg4DWDV1PIW_0t_HiPA5sckuXpyX0co9gHWiFEnp3OG-m4uAe7enO_HxWZLCMBCCJXy99unl73HmnhhaziVoUqioqOcFFfl2IuSircjbpFxXHfLpR7UZtmlLklcSq4W0rZxXns7YoJnohuCjzXMwFz6flNLtuNqossSiqqsnLRrKS41Zqc5HBSZKqRZ7PeSEmRjZoQvrcEoIGKhOCPrz8gvZfNUMXWMmNDjE8W4g6mvSJlg5Mb9n0JnGKTW9hNWpd0r5EsvrL8sS4c_8nFTu10mTwZvFaszsd-6HJWrclATf788_Vzrv_YEviPaotqfSIYr8Q_w0AAP__rN6xgQ">