[lldb-dev] Too many open files

Todd Fiala via lldb-dev lldb-dev at lists.llvm.org
Tue Oct 6 09:41:39 PDT 2015


On Mon, Oct 5, 2015 at 3:58 PM, Adrian McCarthy <amccarth at google.com> wrote:

> Different tools are giving me different numbers.
>
> At the time of the error, Windbg says there are about 2000 open handles,
> most of them are Event handles, not File handles.  That's higher than I'd
> expect, but not really concerning.
>
>
Ah, that's useful.  I am using events (python threading.Event).  These
don't afford any clean up mechanisms on them, so I assume these go away
when the Python objects that hold them go away.


> Process Explorer, however, shows ~20k open handles per Python process
> running dotest.exe.  It also says that about 2000 of those are the
> process's "own handles."  I'm researching to see what that means.  I
> suspect it means that the process has about ~18k handles to objects owned
> by another process and 2k of ones that it actually owns.
>
> I found this Stack Overflow post, which suggests is may be an interaction
> with using Python subprocess in a loop and having those subprocesses work
> with files that are still open in the parent process, but I don't entirely
> understand the answer:
>
>
> http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files
>
>
Hmm I'll read through that.


> It might be a problem with Python subprocess that's been fixed in a newer
> version.  I'm going to try upgrading from Python 2.7.9 to 2.7.10 to see if
> that makes a difference.
>
>
Okay, we're on 2.7.10 on latest OS X.  I *think* I'm using Python 2.7.6 on
Ubuntu 14.04.  Checking now... (yes, 2.7.6 on 14.04).  Ubuntu 15.10 beta 1
is using Python 2.7.10.

Seems reasonable to check that out.  Let me know what you find out!

-Todd


> On Mon, Oct 5, 2015 at 12:02 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>
>> It's possible.  However, I was monitoring actual open files during the
>> course of the run (i.e. what the kernel thought was open for the master
>> driver process, which is the only place that makes sense to see leaks
>> accumulate) in both threading and threading-pool (on OS X), and I saw only
>> the handful of file handles that I'd expect to  be open - pipes
>> (stdout,stderr,stdin) from the main test runner to the inferior test
>> runners, the shared libraries loaded as part of the test runner, and (in my
>> case, but probably not yours for the configuration), the tcp sockets for
>> gathering the test events.  There was no growth, and I didn't see things
>> hanging around longer than I'd expect.
>>
>> The SysInternals process viewer tool is great for this kind of thing -
>> glad you're using it.  Once you find out which file handles are getting
>> leaked and where they came from, we can probably figure out which part of
>> the implementation is leaking it.  I don't *expect* it to be on our side
>> given that it's not showing up on a POSIX-y system, but maybe it really is
>> but isn't in the form of a file handle on the POSIX side.  I should have a
>> look at the memory growth...
>>
>> On Mon, Oct 5, 2015 at 11:41 AM, Adrian McCarthy <amccarth at google.com>
>> wrote:
>>
>>> I'm poking around with some SysInternals tools.  Over the course of test
>>> run, there are about 602k opens (CreateFiles) and 405k
>>> closes (CloseFiles) system-wide.
>>>
>>> I'm looking for a way to stop it once the error happens, so I can see
>>> how many files each process has open.  As it stands, the OS cleans up once
>>> the error is hit.
>>>
>>> I wonder if it's not a matter of actually leaking open file handles but
>>> that the closes are happening too late so that we cross the threshold
>>> shortly before the test runner would have shut everything down.
>>>
>>> On Mon, Oct 5, 2015 at 11:32 AM, Todd Fiala <todd.fiala at gmail.com>
>>> wrote:
>>>
>>>> On OS X, I'm also not seeing growth in the --test-runner-name
>>>> threading-pool (the one you were using on Windows).
>>>>
>>>> Perhaps you can dig into if you're experiencing some kind of file leak
>>>> on Windows.  It's possible you're hitting a platform-specific leak?  I
>>>> recall Ed Maste hitting a FreeBSD-only leak in one or more of the python
>>>> 2.7.x releases.
>>>>
>>>> On Mon, Oct 5, 2015 at 11:26 AM, Todd Fiala <todd.fiala at gmail.com>
>>>> wrote:
>>>>
>>>>> Hmm, on OS X the file handles seem to be well behaved on the
>>>>> --test-runner-name threading.  I'm not seeing any file handle growth beyond
>>>>> the file handles I expect to be open.
>>>>>
>>>>> I'll see if the threading-pool behaves differently.  (That is similar
>>>>> to threading but uses the multiprocessing.pool mechanism, at the expense of
>>>>> me not  being able to catch Ctrl-C at all).
>>>>>
>>>>> It's possible the pool is introducing some leakage at the file level.
>>>>>
>>>>> On Mon, Oct 5, 2015 at 11:20 AM, Todd Fiala <todd.fiala at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Interesting, okay..
>>>>>>
>>>>>> This does appear to be an accumulation issue.  You made it most of
>>>>>> the way through before the issue hit.  I suspect we're leaking file
>>>>>> handles.  It probably doesn't hit the per-process limit on multiprocessing
>>>>>> because the leaked files get spread across more processes.
>>>>>>
>>>>>> (All speculation but does fit the results).
>>>>>>
>>>>>> I'll see if I can look into what's there - if we've got an obvious
>>>>>> leak, I'll take care of it.
>>>>>>
>>>>>> On Mon, Oct 5, 2015 at 9:58 AM, Adrian McCarthy <amccarth at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the ideas.
>>>>>>>
>>>>>>> With `--test-runner-name threading-pool`, I get too many open files.
>>>>>>>
>>>>>>> With `--test-runner-name multiprocessing-pool`, the suite runs fine.
>>>>>>>
>>>>>>> My machine has 40 logical cores.
>>>>>>>
>>>>>>> With `--threads=20`:  SUCCESS (and perhaps _faster_).
>>>>>>>
>>>>>>> With `--threads=30`:  SUCCESS.
>>>>>>>
>>>>>>> With `--threads=36`:  SUCCESS.
>>>>>>>
>>>>>>> With `--threads=38`:  TOO MANY OPEN FILES.
>>>>>>>
>>>>>>> So we're right at the edge.  I'll keep investigating.
>>>>>>>
>>>>>>> So it seems we're on the bleeding edge.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 2, 2015 at 5:38 PM, Todd Fiala <todd.fiala at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> (swapped out the lldb list for the newer one)
>>>>>>>>
>>>>>>>> On Fri, Oct 2, 2015 at 5:37 PM, Todd Fiala <todd.fiala at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hmm, sounds suspicious.
>>>>>>>>>
>>>>>>>>> Can you try running the tests with two options and see if you get
>>>>>>>>> different results?
>>>>>>>>>
>>>>>>>>> # should be equivalent for the default on Windows, thus should
>>>>>>>>> match your above results.  This one uses a thread per worker queue.
>>>>>>>>> --test-runner-name threading-pool
>>>>>>>>>
>>>>>>>>> # should use a different test runner.  This one uses a process per
>>>>>>>>> worker queue.
>>>>>>>>> --test-runner-name multiprocessing-pool
>>>>>>>>>
>>>>>>>>> Aside from that, it seems like the total number of open files is
>>>>>>>>> exceeding some process/system maximum, which sounds like (maybe) we're
>>>>>>>>> leaking files somewhere.  Not enough info yet to guess where that might be
>>>>>>>>> coming in from, but maybe a part of the test runner isn't closing files
>>>>>>>>> somewhere.
>>>>>>>>>
>>>>>>>>> The other thing you can try is reducing the total number of
>>>>>>>>> threads, with:
>>>>>>>>> --threads
>>>>>>>>> {some-number-lower-than-your-total-number-of-logical-cores}
>>>>>>>>>
>>>>>>>>> in the event that your machine has a mongo number of logical
>>>>>>>>> cores, and perhaps it is trying to do too much.  (In that case, the
>>>>>>>>> multiprocessing-pool runner might actually help).
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>> On Fri, Oct 2, 2015 at 5:31 PM, Adrian McCarthy <
>>>>>>>>> amccarth at google.com> wrote:
>>>>>>>>>
>>>>>>>>>> When running LLDB tests on Windows, I started getting a "too many
>>>>>>>>>> open files" error from Python.  I used git bisect to narrow it down to this
>>>>>>>>>> revision:
>>>>>>>>>>
>>>>>>>>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=249182
>>>>>>>>>>
>>>>>>>>>> The error output is:
>>>>>>>>>>
>>>>>>>>>> Command invoked: D:\src\Python-2.7.9\PCbuild\python_d.exe
>>>>>>>>>> D:\src\llvm\llvm\tools\lldb\test\dotest.py -q --arch=i686 --executable
>>>>>>>>>> D:/src/llvm/build/ninja/bin/lldb.exe -s
>>>>>>>>>> D:/src/llvm/build/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS
>>>>>>>>>> --enable-crash-dialog -C D:\src\llvm\build\ninja_release\bin\clang.exe
>>>>>>>>>> --inferior -p TestRecursiveTypes.py D:\src\llvm\llvm\tools\lldb\test
>>>>>>>>>> --event-add-entries worker_index=7:int
>>>>>>>>>> 384 out of 400 test suites processed - TestRecursiveTypes.py
>>>>>>>>>>               Traceback (most recent call last):
>>>>>>>>>>   File "D:/src/llvm/llvm/tools/lldb/test/dotest.py", line 1457,
>>>>>>>>>> in <module>
>>>>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1355, in
>>>>>>>>>> main
>>>>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 968, in
>>>>>>>>>> walk_and_invoke
>>>>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1095, in
>>>>>>>>>> <lambda>
>>>>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 889, in
>>>>>>>>>> threading_test_runner_pool
>>>>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 774, in
>>>>>>>>>> map_async_run_loop
>>>>>>>>>>   File "D:\src\Python-2.7.9\Lib\multiprocessing\pool.py", line
>>>>>>>>>> 558, in get
>>>>>>>>>> OSError: [Errno 24] Too many open files
>>>>>>>>>> [77809 refs]
>>>>>>>>>> ninja: build stopped: subcommand failed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any clue what might have caused this or what can be done to fix
>>>>>>>>>> it?
>>>>>>>>>>
>>>>>>>>>> It's Friday afternoon, so there's no urgency from my
>>>>>>>>>> perspective.  I'll probably get back to this on Monday morning.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Adrian McCarthy
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Todd
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Todd
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Todd
>>>>
>>>
>>>
>>
>>
>> --
>> -Todd
>>
>
>


-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151006/722aa28d/attachment-0001.html>


More information about the lldb-dev mailing list