[lldb-dev] Problem with dotest_channels.py
Todd Fiala via lldb-dev
lldb-dev at lists.llvm.org
Tue Dec 15 18:12:32 PST 2015
Yeah I'll have a look at what it's doing.
I wouldn't expect a return to crash there, just not receive the data. I'm
guessing other parts of asyncore code might be doing invalid things with
the socket at that point. We do need to be able to handle this case,
though, on timeouts that kill the sending/inferior side.
On Tue, Dec 15, 2015 at 3:56 PM, Zachary Turner <zturner at google.com> wrote:
> I wonder if you need a flush somewhere before you invoke the cleanup
> func? Would that do it? It looks like the sending side of the connection
> is closing before the receiving side has received all its data.
>
> On Tue, Dec 15, 2015 at 3:49 PM Adrian McCarthy <amccarth at google.com>
> wrote:
>
>> With Todd's change, I was getting a Ninja crash. Zach and I replaced the
>> returns Todd added with raises, in order to propagate the exception up the
>> stack, and that avoids the Ninja crash, so I'll check that in in a moment.
>>
>> In the mean time, here's the error message we got out of it.
>>
>> 155 out of 416 test suites processed - TestBacktraceAll.py
>> INFO: received socket error when reading data from test inferior:
>> [Errno 10054] An existing connection was forcibly closed by the remote
>> host
>> error: uncaptured python exception, closing channel
>> <lldbsuite.test.dotest_channels.UnpicklingForwardingReaderChannel connected
>> 127.
>> 0.0.1:58961 at 0x2bb8878> (<class 'socket.error'>:[Errno 10054] An
>> existing connection was forcibly closed by the remote host
>> [D:\Python_for_lldb\x86\lib\asyncore.py|read|83]
>> [D:\Python_for_lldb\x86\lib\asyncore.py|handle_read_event|449]
>> [D:\src\llvm\llvm\tools\lldb\packages\Python\lldbsuite\test\dotest_channels.py|handle_read|137]
>> [D:\Python_for_lldb\x86\lib\asyncore.py|recv|387])
>> 175 out of 416 test suites processed - TestNoSuchArch.py
>>
>> On Mon, Dec 14, 2015 at 3:58 PM, Todd Fiala via lldb-dev <
>> lldb-dev at lists.llvm.org> wrote:
>>
>>> Hey Zachary,
>>>
>>> I just put in:
>>> r255581
>>>
>>> which should hopefully:
>>> (1) catch the exception you see there,
>>> (2) handle it gracefully in the common and to-be-expected case of the
>>> test inferior going down hard, and
>>> (3) print out an error if anything else unexpected is happening here.
>>>
>>> Let me know if you get any more info with it. Thanks!
>>>
>>> -Todd
>>>
>>> On Mon, Dec 14, 2015 at 2:16 PM, Todd Fiala <todd.fiala at gmail.com>
>>> wrote:
>>>
>>>> Yeah that's a messed up exception scenario that is hard to read. I'll
>>>> figure something out when I repro it. One side is closing before the other
>>>> is expecting it, but likely in a way we need to expect.
>>>>
>>>> I think it is ugly-ified because it is coming from some kind of worker
>>>> thread within async-core.
>>>>
>>>> I will get something in to help it today. The first bit may be just
>>>> catching the exception as you mentioned.
>>>>
>>>> On Mon, Dec 14, 2015 at 2:05 PM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>> If nothing else, maybe we can print out a more useful exception
>>>>> backtrace. What kind of exception, what line, and what was the message?
>>>>> That might help give us a better idea of what's causing it.
>>>>>
>>>>> On Mon, Dec 14, 2015 at 2:03 PM Todd Fiala <todd.fiala at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Zachary!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 14, 2015 at 1:28 PM, Zachary Turner via lldb-dev <
>>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hi Todd, lately I've been seeing this sporadically when running the
>>>>>>> test suite.
>>>>>>>
>>>>>>> [TestNamespaceLookup.py FAILED]
>>>>>>> Command invoked: C:\Python27_LLDB\x86\python_d.exe
>>>>>>> D:\src\llvm\tools\lldb\test\dotest.pyc -q --arch=i686 --executable
>>>>>>> D:/src/llvmbuild/ninja/bin/lldb.exe -s
>>>>>>> D:/src/llvmbuild/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS
>>>>>>> --enable-crash-dialog -C d:\src\llvmbuild\ninja_release\bin\clang.exe
>>>>>>> --results-port 55886 --inferior -p TestNamespaceLookup.py
>>>>>>> D:\src\llvm\tools\lldb\packages\Python\lldbsuite\test --event-add-entries
>>>>>>> worker_index=10:int
>>>>>>> 416 out of 416 test suites processed - TestAddDsymCommand.py
>>>>>>> error: uncaptured python exception, closing channel
>>>>>>> <lldbsuite.test.dotest_channels.UnpicklingForwardingReaderChannel connected
>>>>>>> 127.0.0.1:56008 at 0x2bdd578> (<class 'socket.error'>:[Errno 10054]
>>>>>>> An existing connection was forcibly closed by the remote host
>>>>>>> [C:\Python27_LLDB\x86\lib\asyncore.py|read|83]
>>>>>>> [C:\Python27_LLDB\x86\lib\asyncore.py|handle_read_event|449]
>>>>>>> [D:\src\llvm\tools\lldb\packages\Python\lldbsuite\test\dotest_channels.py|handle_read|133]
>>>>>>> [C:\Python27_LLDB\x86\lib\asyncore.py|recv|387])
>>>>>>>
>>>>>>> It seems to happen randomly and not always on the same test.
>>>>>>> Sometimes it doesn't happen at all. I wonder if this could be related to
>>>>>>> some of the work that's been going on recently. Are you seeing this? Any
>>>>>>> idea how to diagnose?
>>>>>>>
>>>>>>
>>>>>> Eww.
>>>>>>
>>>>>> That *looks* like one side of the connection between the inferior and
>>>>>> the test runner process choked on reading content from the test event
>>>>>> socket when the other end went down. Reading it a bit more carefully, it
>>>>>> looks like it is the event collector (which would be the parallel test
>>>>>> runner side) that was receiving when the socket went down.
>>>>>>
>>>>>> I think this means I just need to put a try block around the receiver
>>>>>> and just bail out gracefully (possibly with a message) when that happens at
>>>>>> an unexpected time. Since test inferiors can die at any time, possibly due
>>>>>> to a timeout where they are forcibly killed, we do need to handle that
>>>>>> gracefully.'
>>>>>>
>>>>>> I'll see if I can force it, replicate it, and fix it. I'll look at
>>>>>> that now (pending watching the buildbots for the other change I just put
>>>>>> in).
>>>>>>
>>>>>> And yes, this would be a code path that we use heavily with the xUnit
>>>>>> reporter, but only started getting used by you more recently when I turned
>>>>>> on the newer summary results by default. (The newer summary results use
>>>>>> the test event system, which means test inferiors are now going to be using
>>>>>> the sockets to pass back test events, where you didn't have that happening
>>>>>> before unless you used the curses or xUnit results formatter).
>>>>>>
>>>>>> I hope to have it reproduced and fixed up here quickly. I suspect
>>>>>> you may have an environment that just might make it more prevalent, but it
>>>>>> needs to be fixed.
>>>>>>
>>>>>> Hopefully back in a bit with a fix!
>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> lldb-dev mailing list
>>>>>>> lldb-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Todd
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -Todd
>>>>
>>>
>>>
>>>
>>> --
>>> -Todd
>>>
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>>
>>
--
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151215/81f8e1d4/attachment.html>
More information about the lldb-dev
mailing list