[lldb-dev] Too many open files

Adrian McCarthy via lldb-dev lldb-dev at lists.llvm.org
Mon Oct 5 11:41:55 PDT 2015


I'm poking around with some SysInternals tools.  Over the course of test
run, there are about 602k opens (CreateFiles) and 405k
closes (CloseFiles) system-wide.

I'm looking for a way to stop it once the error happens, so I can see how
many files each process has open.  As it stands, the OS cleans up once the
error is hit.

I wonder if it's not a matter of actually leaking open file handles but
that the closes are happening too late so that we cross the threshold
shortly before the test runner would have shut everything down.

On Mon, Oct 5, 2015 at 11:32 AM, Todd Fiala <todd.fiala at gmail.com> wrote:

> On OS X, I'm also not seeing growth in the --test-runner-name
> threading-pool (the one you were using on Windows).
>
> Perhaps you can dig into if you're experiencing some kind of file leak on
> Windows.  It's possible you're hitting a platform-specific leak?  I recall
> Ed Maste hitting a FreeBSD-only leak in one or more of the python 2.7.x
> releases.
>
> On Mon, Oct 5, 2015 at 11:26 AM, Todd Fiala <todd.fiala at gmail.com> wrote:
>
>> Hmm, on OS X the file handles seem to be well behaved on the
>> --test-runner-name threading.  I'm not seeing any file handle growth beyond
>> the file handles I expect to be open.
>>
>> I'll see if the threading-pool behaves differently.  (That is similar to
>> threading but uses the multiprocessing.pool mechanism, at the expense of me
>> not  being able to catch Ctrl-C at all).
>>
>> It's possible the pool is introducing some leakage at the file level.
>>
>> On Mon, Oct 5, 2015 at 11:20 AM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>
>>> Interesting, okay..
>>>
>>> This does appear to be an accumulation issue.  You made it most of the
>>> way through before the issue hit.  I suspect we're leaking file handles.
>>> It probably doesn't hit the per-process limit on multiprocessing because
>>> the leaked files get spread across more processes.
>>>
>>> (All speculation but does fit the results).
>>>
>>> I'll see if I can look into what's there - if we've got an obvious leak,
>>> I'll take care of it.
>>>
>>> On Mon, Oct 5, 2015 at 9:58 AM, Adrian McCarthy <amccarth at google.com>
>>> wrote:
>>>
>>>> Thanks for the ideas.
>>>>
>>>> With `--test-runner-name threading-pool`, I get too many open files.
>>>>
>>>> With `--test-runner-name multiprocessing-pool`, the suite runs fine.
>>>>
>>>> My machine has 40 logical cores.
>>>>
>>>> With `--threads=20`:  SUCCESS (and perhaps _faster_).
>>>>
>>>> With `--threads=30`:  SUCCESS.
>>>>
>>>> With `--threads=36`:  SUCCESS.
>>>>
>>>> With `--threads=38`:  TOO MANY OPEN FILES.
>>>>
>>>> So we're right at the edge.  I'll keep investigating.
>>>>
>>>> So it seems we're on the bleeding edge.
>>>>
>>>>
>>>> On Fri, Oct 2, 2015 at 5:38 PM, Todd Fiala <todd.fiala at gmail.com>
>>>> wrote:
>>>>
>>>>> (swapped out the lldb list for the newer one)
>>>>>
>>>>> On Fri, Oct 2, 2015 at 5:37 PM, Todd Fiala <todd.fiala at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hmm, sounds suspicious.
>>>>>>
>>>>>> Can you try running the tests with two options and see if you get
>>>>>> different results?
>>>>>>
>>>>>> # should be equivalent for the default on Windows, thus should match
>>>>>> your above results.  This one uses a thread per worker queue.
>>>>>> --test-runner-name threading-pool
>>>>>>
>>>>>> # should use a different test runner.  This one uses a process per
>>>>>> worker queue.
>>>>>> --test-runner-name multiprocessing-pool
>>>>>>
>>>>>> Aside from that, it seems like the total number of open files is
>>>>>> exceeding some process/system maximum, which sounds like (maybe) we're
>>>>>> leaking files somewhere.  Not enough info yet to guess where that might be
>>>>>> coming in from, but maybe a part of the test runner isn't closing files
>>>>>> somewhere.
>>>>>>
>>>>>> The other thing you can try is reducing the total number of threads,
>>>>>> with:
>>>>>> --threads {some-number-lower-than-your-total-number-of-logical-cores}
>>>>>>
>>>>>> in the event that your machine has a mongo number of logical cores,
>>>>>> and perhaps it is trying to do too much.  (In that case, the
>>>>>> multiprocessing-pool runner might actually help).
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Fri, Oct 2, 2015 at 5:31 PM, Adrian McCarthy <amccarth at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> When running LLDB tests on Windows, I started getting a "too many
>>>>>>> open files" error from Python.  I used git bisect to narrow it down to this
>>>>>>> revision:
>>>>>>>
>>>>>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=249182
>>>>>>>
>>>>>>> The error output is:
>>>>>>>
>>>>>>> Command invoked: D:\src\Python-2.7.9\PCbuild\python_d.exe
>>>>>>> D:\src\llvm\llvm\tools\lldb\test\dotest.py -q --arch=i686 --executable
>>>>>>> D:/src/llvm/build/ninja/bin/lldb.exe -s
>>>>>>> D:/src/llvm/build/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS
>>>>>>> --enable-crash-dialog -C D:\src\llvm\build\ninja_release\bin\clang.exe
>>>>>>> --inferior -p TestRecursiveTypes.py D:\src\llvm\llvm\tools\lldb\test
>>>>>>> --event-add-entries worker_index=7:int
>>>>>>> 384 out of 400 test suites processed - TestRecursiveTypes.py
>>>>>>>           Traceback (most recent call last):
>>>>>>>   File "D:/src/llvm/llvm/tools/lldb/test/dotest.py", line 1457, in
>>>>>>> <module>
>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1355, in
>>>>>>> main
>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 968, in
>>>>>>> walk_and_invoke
>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1095, in
>>>>>>> <lambda>
>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 889, in
>>>>>>> threading_test_runner_pool
>>>>>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 774, in
>>>>>>> map_async_run_loop
>>>>>>>   File "D:\src\Python-2.7.9\Lib\multiprocessing\pool.py", line 558,
>>>>>>> in get
>>>>>>> OSError: [Errno 24] Too many open files
>>>>>>> [77809 refs]
>>>>>>> ninja: build stopped: subcommand failed.
>>>>>>>
>>>>>>>
>>>>>>> Any clue what might have caused this or what can be done to fix it?
>>>>>>>
>>>>>>> It's Friday afternoon, so there's no urgency from my perspective.
>>>>>>> I'll probably get back to this on Monday morning.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Adrian McCarthy
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Todd
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Todd
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -Todd
>>>
>>
>>
>>
>> --
>> -Todd
>>
>
>
>
> --
> -Todd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151005/96e76ce6/attachment-0001.html>


More information about the lldb-dev mailing list