[lld] r194545 - [PECOFF] Fix use-after-return.

Kostya Serebryany kcc at google.com
Fri Nov 15 01:48:17 PST 2013


On Fri, Nov 15, 2013 at 1:44 PM, Alexey Samsonov <samsonov at google.com>wrote:

>
> On Fri, Nov 15, 2013 at 8:46 AM, Sean Silva <silvas at purdue.edu> wrote:
>
>>
>>
>>
>> On Thu, Nov 14, 2013 at 6:30 PM, Rick Foos <rfoos at codeaurora.org> wrote:
>>
>>>  There is a problem with threads. I'll try to describe what I'm seeing.
>>>
>>> Thanks for looking at this,
>>> Rick
>>>
>>> ninja '-j 12' '-l 32' check-all
>>> Lauches 200+ llvm-symbolizer's and consumes 24G memory, going into swap
>>> space.
>>>
>>> It doesn't halt but does keep going with a load average 80, 44 zombie's,
>>> and this run 10 llvm-symbolizers (highlighted) at the top.
>>>
>>> Quite a bit of the memory is released later on, and the testing
>>> continues...
>>>
>>> The last line of stdio stays the same. No interim tests results are
>>> displayed.
>>>
>>> [189/189] Running all regression tests
>>>
>>> repeating sequence:
>>> A large number of llvm-symbolizers are launched 200+
>>> They run for a few minutes, and then complete. The top 10
>>> llvm-symbolizers stay resident.
>>>
>>> On average 132 kworkers are running.
>>> On average 76 llvm-symbolizers are running, but they do drop to near 0
>>> before restarting.
>>>
>>
>> This "thundering herd" of symbolizers seems really problematic.
>>
>
> It looks like  we really need to fix our runtimes to only launch
> llvm-symbolizer if we want to report an error.
>

Or, better, get rid of the out-of-process symbolizer completely in favor of
the in-process one.


>
>
>> They are all likely reporting the same bug. As a quick experiment, you
>> should try the following:
>>
>> $ mv llvm-symbolizer llvm-symbolizer_REAL
>> $ echo 'exec flock ./symbolizer.lock ./llvm-symbolizer_REAL'
>> >llvm-symbolizer
>> $ chmod +x llvm-symbolizer
>>
>> That should make sure that only a single llvm-symbolizer ever runs. It
>> will completely serialize the symbolizers, but that still might be a win
>> over swapping. You can also add the `-n` option to flock to cause it to
>> fail if there is already another symbolizer running (that might be useful
>> so that the build finishes quickly, while still getting at least one
>> sanitizer error report).
>>
>> Also, wtf is llvm-symbolizer doing that needs so much memory??? That
>> seems like the root cause of this issue...
>>
>>
>>>
>>> As time go on, the top llvm-symbolizers go from 50% cpu, to 100% CPU now
>>> up to 116% CPU.
>>>
>>>
>>>
>>>
>>>
>>> ---
>>>
>>> top - 15:16:28 up 16 min,  1 user,  load average: 80.91, 69.35, 38.58
>>> Tasks: 466 total,  66 running, 356 sleeping,   0 stopped,  44 zombie
>>> %Cpu(s): 28.8 us, 71.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,
>>> 0.0 st
>>> KiB Mem:  24520168 total,  1735968 used, 22784200 free,    10240 buffers
>>> KiB Swap:  1999868 total,   144028 used,  1855840 free,   116280 cached
>>>
>>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+
>>> COMMAND
>>> 54979 buildbot  20   0 1024g  12m   12 R    46  0.1   4:09.50
>>> llvm-symbolizer
>>> 55000 buildbot  20   0 1024g  12m   12 R    46  0.1   4:09.02
>>> llvm-symbolizer
>>> 54771 buildbot  20   0 97.0t  27m   48 R    44  0.1   4:10.47
>>> llvm-symbolizer
>>> 54923 buildbot  20   0 1024g  12m   12 R    44  0.1   4:07.50
>>> llvm-symbolizer
>>> 54769 buildbot  20   0 97.0t  27m   48 R    44  0.1   4:09.85
>>> llvm-symbolizer
>>> 55144 buildbot  20   0 1024g  12m   12 R    44  0.1   4:07.72
>>> llvm-symbolizer
>>> 54882 buildbot  20   0 1024g  12m   12 R    43  0.1   4:11.09
>>> llvm-symbolizer
>>> 54975 buildbot  20   0 1024g  12m   12 R    42  0.1   4:08.50
>>> llvm-symbolizer
>>> 54922 buildbot  20   0 1024g  12m   12 R    41  0.1   4:09.29
>>> llvm-symbolizer
>>> 54958 buildbot  20   0 1024g  12m   12 R    39  0.1   4:07.27
>>> llvm-symbolizer
>>>
>>
>> Why is the symbolizer using so much virtual address space? I know that
>> the sanitizers themselves need a lot for their shadow memory, but just
>> symbolizing should hardly use any...
>>
>>
>>>     1 root      20   0 26920 1500  536 S    11  0.0   0:49.61
>>> init
>>>    10 root      20   0     0    0    0 S     2  0.0   0:11.64
>>> rcu_sched
>>>   209 root      20   0     0    0    0 S     2  0.0   0:10.44
>>> kworker/0:1
>>>    15 root      20   0     0    0    0 S     2  0.0   0:09.85
>>> kworker/1:0
>>>   178 root      20   0     0    0    0 S     2  0.0   0:08.85
>>> kworker/24:1
>>>   202 root      20   0     0    0    0 S     2  0.0   0:09.95
>>> kworker/12:1
>>>   205 root      20   0     0    0    0 S     2  0.0   0:09.71
>>> kworker/15:1
>>>
>>> ---- pstree
>>> systemadmin at quicbuild03:~$ pstree
>>> init-+-acpid
>>>      |-avahi-daemon---avahi-daemon
>>>      |-bluetoothd
>>>      |-buildslave-+-ninja---sh---python-+-23*[python---bash]
>>>      |            |                     |-8*[python-+-bash]
>>>      |            |                     |           `-{python}]
>>>      |            |
>>> |-python---bash---FileCheck-+-llvm-symb+
>>>      |            |                     |
>>> `-{FileChec+
>>>      |            |                     `-{python}
>>>      |            `-{buildslave}
>>>      |-buildslave---{buildslave}
>>>      |-console-kit-dae---64*[{console-kit-dae}]
>>>      |-cron
>>>      |-cups-browsed
>>>      |-cupsd
>>>      |-dbus-daemon
>>>      |-exim4
>>>      |-6*[getty]
>>>      |-irqbalance
>>>      |-13*[llvm-symbolizer-+-llvm-symbolizer]
>>>      |                     `-{llvm-symbolizer}]
>>>      |-2*[llvm-symbolizer---{llvm-symbolizer}]
>>>      |-2*[llvm-symbolizer---llvm-symbolizer]
>>>      |-45*[llvm-symbolizer]
>>>
>>
>> This is really strange. Does llvm-symbolizer double-fork or something?
>> How are these getting de-parented?
>>
>> -- Sean Silva
>>
>>
>>
>>>      |-nrpe
>>>      |-nscd---21*[{nscd}]
>>>      |-ntpd
>>>      |-polkitd---{polkitd}
>>>      |-rpc.idmapd
>>>      |-rpc.statd
>>>      |-rpcbind
>>>      |-rsyslogd---3*[{rsyslogd}]
>>>      |-sshd---sshd---sshd---bash---pstree
>>>      |-udevd---2*[udevd]
>>>      |-upstart-file-br
>>>      |-upstart-socket-
>>>      |-upstart-udev-br
>>>      `-whoopsie---{whoopsie}
>>>
>>>
>>>
>>>
>>> On 11/14/2013 04:47 PM, Sergey Matveev wrote:
>>>
>>> +kcc, samsonov (please don't remove people from CC)
>>>
>>>  You mean in the presence of threads? There's no such option because
>>> it's not supposed to interfere with the symbolizer. If it does then it's a
>>> bug, someone from our team will follow up on this tomorrow.
>>>
>>>  Sergey
>>>
>>> On Fri, Nov 15, 2013 at 2:01 AM, Rick Foos <rfoos at codeaurora.org> wrote:
>>>
>>>>  Thank you Sergey!
>>>>
>>>> Address Sanitize running alone on a server is stable without the
>>>> symbolizer option. It is running all the tests in a reasonable amount of
>>>> time, and there are no llvm-symbolizer tasks.
>>>>
>>>> The problem is coming from Threads, and I'm trying to prove that now.
>>>>
>>>> If threads runs clean by itself alone on a server, there is an
>>>> interaction with both address and threads running at the same time.
>>>>
>>>> Is there a similar feature to disable symbolizer in threads?
>>>>
>>>> Best Regards,
>>>> Rick
>>>>
>>>>
>>>> On 11/14/2013 03:51 PM, Sergey Matveev wrote:
>>>>
>>>> ASAN_OPTIONS=symbolize=false
>>>>
>>>>
>>>> On Fri, Nov 15, 2013 at 1:14 AM, Nick Kledzik <kledzik at apple.com>wrote:
>>>>
>>>>>
>>>>>  On Nov 14, 2013, at 9:07 AM, Rick Foos <rfoos at codeaurora.org> wrote:
>>>>>
>>>>>   Status: System in swap overnight. Stopped both buildmaster and
>>>>> slave. 187 llvm-symbolizer tasks were still running. Tasks did not stop
>>>>> after
>>>>>
>>>>>  Retried this morning, no other workload, 8 llvm-symbolizer tasks
>>>>> consuming 100% on each cpu
>>>>>
>>>>>
>>>>>  Doesn’t that mean that Asan found some problems, but is stuck trying
>>>>> to symbolicate the backtraces?   Is there a way to run Asan and *not*
>>>>> symbolicate?
>>>>>
>>>>>  This also seems like a bug (infinite loop?) in llvm-symbolizer.
>>>>>
>>>>>  -Nick
>>>>>
>>>>>
>>>>>   . 7 zombie tasks.
>>>>>
>>>>>  So not quite ready this morning. If anyone knows of an
>>>>> llvm-sanitizer issue like this it would help.
>>>>>
>>>>>   *From:* llvm-commits-bounces at cs.uiuc.edu [
>>>>> mailto:llvm-commits-bounces at cs.uiuc.edu<llvm-commits-bounces at cs.uiuc.edu>
>>>>> ] *On Behalf Of *Rick Foos
>>>>> *Sent:* Wednesday, November 13, 2013 1:42 PM
>>>>> *To:* Sergey Matveev; Shankar Easwaran
>>>>> *Cc:* llvm-commits at cs.uiuc.edu; Galina Kistanova
>>>>> *Subject:* Re: [lld] r194545 - [PECOFF] Fix use-after-return.
>>>>>
>>>>>  Sorry for the delay,
>>>>>
>>>>> Our problem with running the sanitizers is that the load average
>>>>> running under Ninja reached 146 and a short time after a system crash
>>>>> requiring calling someone to power cycle the box...
>>>>>
>>>>> The address sanitizer by itself leaves a load average 40. This means
>>>>> the OS over 100% utilization, and is thrashing a bit. Load Average doesn't
>>>>> say what exactly is thrashing.
>>>>>
>>>>> Ninja supports make's -j, and -l options. The -l maximum load average,
>>>>> is the key.
>>>>>
>>>>> The load average should be less than the total number of cores
>>>>> (hyperthreads too) before Ninja launches another task.
>>>>>
>>>>> A Load Average at or lower than 100%  technically should benefit
>>>>> performance, and maximize throughput. However, I will be happy if I don't
>>>>> have to call someone to power cycle the server :)
>>>>>
>>>>> So the maximum load average of a 16 core machine with hyperthreads is
>>>>> 32 (keeping it simple). This needs to be passed to all make's and Ninja
>>>>> build steps on that slave to maximize throughput.
>>>>>
>>>>> For now, I'm looking at a minimal patch to include jobs and a new
>>>>> loadaverage variable for the sanitizers.
>>>>>
>>>>> Longer term, all buildslaves should define maximum loadaverage, and
>>>>> all make/ninja steps should pass -j, and -l options.
>>>>>
>>>>> Best Regards,
>>>>> Rick
>>>>>
>>>>> On 11/13/2013 11:21 AM, Sergey Matveev wrote:
>>>>>
>>>>>  +kcc
>>>>>
>>>>>
>>>>>  On Wed, Nov 13, 2013 at 6:41 AM, Shankar Easwaran <
>>>>> shankare at codeaurora.org> wrote:
>>>>> Sorry for another indirection. Rick foos is working on it. I think
>>>>> there is some good news here :)
>>>>>
>>>>> Cced Rick + adding Galina,Dmitri.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Shankar Easwaran
>>>>>
>>>>>
>>>>> On 11/12/2013 8:37 PM, Rui Ueyama wrote:
>>>>>
>>>>> Shankar tried to set it up recently.
>>>>>
>>>>>
>>>>> On Tue, Nov 12, 2013 at 6:31 PM, Sean Silva <silvas at purdue.edu> wrote:
>>>>>
>>>>> Sanitizers?
>>>>>
>>>>> There have been a couple of these sorts of bugs recently... we really
>>>>> ought to have some sanitizer bots...
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>> On Tue, Nov 12, 2013 at 9:21 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>> Author: ruiu
>>>>> Date: Tue Nov 12 20:21:51 2013
>>>>> New Revision: 194545
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=194545&view=rev
>>>>> Log:
>>>>> [PECOFF] Fix use-after-return.
>>>>>
>>>>> Modified:
>>>>>      lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>>>
>>>>> Modified: lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>>> URL:
>>>>>
>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff
>>>>>
>>>>>
>>>>> ==============================================================================
>>>>> --- lld/trunk/lib/Driver/WinLinkDriver.cpp (original)
>>>>> +++ lld/trunk/lib/Driver/WinLinkDriver.cpp Tue Nov 12 20:21:51 2013
>>>>> @@ -842,7 +842,7 @@ WinLinkDriver::parse(int argc, const cha
>>>>>
>>>>>       case OPT_INPUT:
>>>>>         inputElements.push_back(std::unique_ptr<InputElement>(
>>>>> -          new PECOFFFileNode(ctx, inputArg->getValue())));
>>>>> +          new PECOFFFileNode(ctx,
>>>>> ctx.allocateString(inputArg->getValue()))));
>>>>>         break;
>>>>>
>>>>>   #define DEFINE_BOOLEAN_FLAG(name, setter)       \
>>>>> @@ -892,9 +892,11 @@ WinLinkDriver::parse(int argc, const cha
>>>>>     // start with a hypen or a slash. This is not compatible with
>>>>> link.exe
>>>>>     // but useful for us to test lld on Unix.
>>>>>     if (llvm::opt::Arg *dashdash =
>>>>> parsedArgs->getLastArg(OPT_DASH_DASH)) {
>>>>> -    for (const StringRef value : dashdash->getValues())
>>>>> -      inputElements.push_back(
>>>>> -          std::unique_ptr<InputElement>(new PECOFFFileNode(ctx,
>>>>> value)));
>>>>> +    for (const StringRef value : dashdash->getValues()) {
>>>>> +      std::unique_ptr<InputElement> elem(
>>>>> +          new PECOFFFileNode(ctx, ctx.allocateString(value)));
>>>>> +      inputElements.push_back(std::move(elem));
>>>>> +    }
>>>>>     }
>>>>>
>>>>>     // Add the libraries specified by /defaultlib unless they are
>>>>> already
>>>>> added
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>> hosted by the Linux Foundation
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>>
>>>>> llvm-commits mailing list
>>>>>
>>>>> llvm-commits at cs.uiuc.edu
>>>>>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>>
>>>>> Rick Foos
>>>>>
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>>>>
>>>>>  _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Rick Foos
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>>>
>>>>
>>>
>>>
>>> --
>>> Rick Foos
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
> --
> Alexey Samsonov, MSK
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131115/cb7b5c3c/attachment.html>


More information about the llvm-commits mailing list