[lld] r194545 - [PECOFF] Fix use-after-return.

Thu Nov 14 20:46:22 PST 2013

On Thu, Nov 14, 2013 at 6:30 PM, Rick Foos <rfoos at codeaurora.org> wrote:

>  There is a problem with threads. I'll try to describe what I'm seeing.
>
> Thanks for looking at this,
> Rick
>
> ninja '-j 12' '-l 32' check-all
> Lauches 200+ llvm-symbolizer's and consumes 24G memory, going into swap
> space.
>
> It doesn't halt but does keep going with a load average 80, 44 zombie's,
> and this run 10 llvm-symbolizers (highlighted) at the top.
>
> Quite a bit of the memory is released later on, and the testing
> continues...
>
> The last line of stdio stays the same. No interim tests results are
> displayed.
>
> [189/189] Running all regression tests
>
> repeating sequence:
> A large number of llvm-symbolizers are launched 200+
> They run for a few minutes, and then complete. The top 10 llvm-symbolizers
> stay resident.
>
> On average 132 kworkers are running.
> On average 76 llvm-symbolizers are running, but they do drop to near 0
> before restarting.
>

This "thundering herd" of symbolizers seems really problematic. They are
all likely reporting the same bug. As a quick experiment, you should try
the following:

$ mv llvm-symbolizer llvm-symbolizer_REAL
$ echo 'exec flock ./symbolizer.lock ./llvm-symbolizer_REAL'
>llvm-symbolizer
$ chmod +x llvm-symbolizer

That should make sure that only a single llvm-symbolizer ever runs. It will
completely serialize the symbolizers, but that still might be a win over
swapping. You can also add the `-n` option to flock to cause it to fail if
there is already another symbolizer running (that might be useful so that
the build finishes quickly, while still getting at least one sanitizer
error report).

Also, wtf is llvm-symbolizer doing that needs so much memory??? That seems
like the root cause of this issue...

>
> As time go on, the top llvm-symbolizers go from 50% cpu, to 100% CPU now
> up to 116% CPU.
>
>
>
>
>
> ---
>
> top - 15:16:28 up 16 min,  1 user,  load average: 80.91, 69.35, 38.58
> Tasks: 466 total,  66 running, 356 sleeping,   0 stopped,  44 zombie
> %Cpu(s): 28.8 us, 71.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,
> 0.0 st
> KiB Mem:  24520168 total,  1735968 used, 22784200 free,    10240 buffers
> KiB Swap:  1999868 total,   144028 used,  1855840 free,   116280 cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+
> COMMAND
> 54979 buildbot  20   0 1024g  12m   12 R    46  0.1   4:09.50
> llvm-symbolizer
> 55000 buildbot  20   0 1024g  12m   12 R    46  0.1   4:09.02
> llvm-symbolizer
> 54771 buildbot  20   0 97.0t  27m   48 R    44  0.1   4:10.47
> llvm-symbolizer
> 54923 buildbot  20   0 1024g  12m   12 R    44  0.1   4:07.50
> llvm-symbolizer
> 54769 buildbot  20   0 97.0t  27m   48 R    44  0.1   4:09.85
> llvm-symbolizer
> 55144 buildbot  20   0 1024g  12m   12 R    44  0.1   4:07.72
> llvm-symbolizer
> 54882 buildbot  20   0 1024g  12m   12 R    43  0.1   4:11.09
> llvm-symbolizer
> 54975 buildbot  20   0 1024g  12m   12 R    42  0.1   4:08.50
> llvm-symbolizer
> 54922 buildbot  20   0 1024g  12m   12 R    41  0.1   4:09.29
> llvm-symbolizer
> 54958 buildbot  20   0 1024g  12m   12 R    39  0.1   4:07.27
> llvm-symbolizer
>

Why is the symbolizer using so much virtual address space? I know that the
sanitizers themselves need a lot for their shadow memory, but just
symbolizing should hardly use any...

>     1 root      20   0 26920 1500  536 S    11  0.0   0:49.61
> init
>    10 root      20   0     0    0    0 S     2  0.0   0:11.64
> rcu_sched
>   209 root      20   0     0    0    0 S     2  0.0   0:10.44
> kworker/0:1
>    15 root      20   0     0    0    0 S     2  0.0   0:09.85
> kworker/1:0
>   178 root      20   0     0    0    0 S     2  0.0   0:08.85
> kworker/24:1
>   202 root      20   0     0    0    0 S     2  0.0   0:09.95
> kworker/12:1
>   205 root      20   0     0    0    0 S     2  0.0   0:09.71
> kworker/15:1
>
> ---- pstree
> systemadmin at quicbuild03:~$ pstree
> init-+-acpid
>      |-avahi-daemon---avahi-daemon
>      |-bluetoothd
>      |-buildslave-+-ninja---sh---python-+-23*[python---bash]
>      |            |                     |-8*[python-+-bash]
>      |            |                     |           `-{python}]
>      |            |
> |-python---bash---FileCheck-+-llvm-symb+
>      |            |                     |
> `-{FileChec+
>      |            |                     `-{python}
>      |            `-{buildslave}
>      |-buildslave---{buildslave}
>      |-console-kit-dae---64*[{console-kit-dae}]
>      |-cron
>      |-cups-browsed
>      |-cupsd
>      |-dbus-daemon
>      |-exim4
>      |-6*[getty]
>      |-irqbalance
>      |-13*[llvm-symbolizer-+-llvm-symbolizer]
>      |                     `-{llvm-symbolizer}]
>      |-2*[llvm-symbolizer---{llvm-symbolizer}]
>      |-2*[llvm-symbolizer---llvm-symbolizer]
>      |-45*[llvm-symbolizer]
>

This is really strange. Does llvm-symbolizer double-fork or something? How
are these getting de-parented?

-- Sean Silva

>      |-nrpe
>      |-nscd---21*[{nscd}]
>      |-ntpd
>      |-polkitd---{polkitd}
>      |-rpc.idmapd
>      |-rpc.statd
>      |-rpcbind
>      |-rsyslogd---3*[{rsyslogd}]
>      |-sshd---sshd---sshd---bash---pstree
>      |-udevd---2*[udevd]
>      |-upstart-file-br
>      |-upstart-socket-
>      |-upstart-udev-br
>      `-whoopsie---{whoopsie}
>
>
>
>
> On 11/14/2013 04:47 PM, Sergey Matveev wrote:
>
> +kcc, samsonov (please don't remove people from CC)
>
>  You mean in the presence of threads? There's no such option because it's
> not supposed to interfere with the symbolizer. If it does then it's a bug,
> someone from our team will follow up on this tomorrow.
>
>  Sergey
>
> On Fri, Nov 15, 2013 at 2:01 AM, Rick Foos <rfoos at codeaurora.org> wrote:
>
>>  Thank you Sergey!
>>
>> Address Sanitize running alone on a server is stable without the
>> symbolizer option. It is running all the tests in a reasonable amount of
>> time, and there are no llvm-symbolizer tasks.
>>
>> The problem is coming from Threads, and I'm trying to prove that now.
>>
>> If threads runs clean by itself alone on a server, there is an
>> interaction with both address and threads running at the same time.
>>
>> Is there a similar feature to disable symbolizer in threads?
>>
>> Best Regards,
>> Rick
>>
>>
>> On 11/14/2013 03:51 PM, Sergey Matveev wrote:
>>
>> ASAN_OPTIONS=symbolize=false
>>
>>
>> On Fri, Nov 15, 2013 at 1:14 AM, Nick Kledzik <kledzik at apple.com> wrote:
>>
>>>
>>>  On Nov 14, 2013, at 9:07 AM, Rick Foos <rfoos at codeaurora.org> wrote:
>>>
>>>   Status: System in swap overnight. Stopped both buildmaster and slave.
>>> 187 llvm-symbolizer tasks were still running. Tasks did not stop after
>>>
>>>  Retried this morning, no other workload, 8 llvm-symbolizer tasks
>>> consuming 100% on each cpu
>>>
>>>
>>>  Doesn’t that mean that Asan found some problems, but is stuck trying
>>> to symbolicate the backtraces?   Is there a way to run Asan and *not*
>>> symbolicate?
>>>
>>>  This also seems like a bug (infinite loop?) in llvm-symbolizer.
>>>
>>>  -Nick
>>>
>>>
>>>   . 7 zombie tasks.
>>>
>>>  So not quite ready this morning. If anyone knows of an llvm-sanitizer
>>> issue like this it would help.
>>>
>>>   *From:* llvm-commits-bounces at cs.uiuc.edu [
>>> mailto:llvm-commits-bounces at cs.uiuc.edu<llvm-commits-bounces at cs.uiuc.edu>
>>> ] *On Behalf Of *Rick Foos
>>> *Sent:* Wednesday, November 13, 2013 1:42 PM
>>> *To:* Sergey Matveev; Shankar Easwaran
>>> *Cc:* llvm-commits at cs.uiuc.edu; Galina Kistanova
>>> *Subject:* Re: [lld] r194545 - [PECOFF] Fix use-after-return.
>>>
>>>  Sorry for the delay,
>>>
>>> Our problem with running the sanitizers is that the load average running
>>> under Ninja reached 146 and a short time after a system crash requiring
>>> calling someone to power cycle the box...
>>>
>>> The address sanitizer by itself leaves a load average 40. This means the
>>> OS over 100% utilization, and is thrashing a bit. Load Average doesn't say
>>> what exactly is thrashing.
>>>
>>> Ninja supports make's -j, and -l options. The -l maximum load average,
>>> is the key.
>>>
>>> The load average should be less than the total number of cores
>>> (hyperthreads too) before Ninja launches another task.
>>>
>>> A Load Average at or lower than 100%  technically should benefit
>>> performance, and maximize throughput. However, I will be happy if I don't
>>> have to call someone to power cycle the server :)
>>>
>>> So the maximum load average of a 16 core machine with hyperthreads is 32
>>> (keeping it simple). This needs to be passed to all make's and Ninja build
>>> steps on that slave to maximize throughput.
>>>
>>> For now, I'm looking at a minimal patch to include jobs and a new
>>> loadaverage variable for the sanitizers.
>>>
>>> Longer term, all buildslaves should define maximum loadaverage, and all
>>> make/ninja steps should pass -j, and -l options.
>>>
>>> Best Regards,
>>> Rick
>>>
>>> On 11/13/2013 11:21 AM, Sergey Matveev wrote:
>>>
>>>  +kcc
>>>
>>>
>>>  On Wed, Nov 13, 2013 at 6:41 AM, Shankar Easwaran <
>>> shankare at codeaurora.org> wrote:
>>> Sorry for another indirection. Rick foos is working on it. I think there
>>> is some good news here :)
>>>
>>> Cced Rick + adding Galina,Dmitri.
>>>
>>> Thanks
>>>
>>> Shankar Easwaran
>>>
>>>
>>> On 11/12/2013 8:37 PM, Rui Ueyama wrote:
>>>
>>> Shankar tried to set it up recently.
>>>
>>>
>>> On Tue, Nov 12, 2013 at 6:31 PM, Sean Silva <silvas at purdue.edu> wrote:
>>>
>>> Sanitizers?
>>>
>>> There have been a couple of these sorts of bugs recently... we really
>>> ought to have some sanitizer bots...
>>>
>>> -- Sean Silva
>>>
>>>
>>> On Tue, Nov 12, 2013 at 9:21 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>> Author: ruiu
>>> Date: Tue Nov 12 20:21:51 2013
>>> New Revision: 194545
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=194545&view=rev
>>> Log:
>>> [PECOFF] Fix use-after-return.
>>>
>>> Modified:
>>>      lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>
>>> Modified: lld/trunk/lib/Driver/WinLinkDriver.cpp
>>> URL:
>>>
>>> http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff
>>>
>>>
>>> ==============================================================================
>>> --- lld/trunk/lib/Driver/WinLinkDriver.cpp (original)
>>> +++ lld/trunk/lib/Driver/WinLinkDriver.cpp Tue Nov 12 20:21:51 2013
>>> @@ -842,7 +842,7 @@ WinLinkDriver::parse(int argc, const cha
>>>
>>>       case OPT_INPUT:
>>>         inputElements.push_back(std::unique_ptr<InputElement>(
>>> -          new PECOFFFileNode(ctx, inputArg->getValue())));
>>> +          new PECOFFFileNode(ctx,
>>> ctx.allocateString(inputArg->getValue()))));
>>>         break;
>>>
>>>   #define DEFINE_BOOLEAN_FLAG(name, setter)       \
>>> @@ -892,9 +892,11 @@ WinLinkDriver::parse(int argc, const cha
>>>     // start with a hypen or a slash. This is not compatible with
>>> link.exe
>>>     // but useful for us to test lld on Unix.
>>>     if (llvm::opt::Arg *dashdash =
>>> parsedArgs->getLastArg(OPT_DASH_DASH)) {
>>> -    for (const StringRef value : dashdash->getValues())
>>> -      inputElements.push_back(
>>> -          std::unique_ptr<InputElement>(new PECOFFFileNode(ctx,
>>> value)));
>>> +    for (const StringRef value : dashdash->getValues()) {
>>> +      std::unique_ptr<InputElement> elem(
>>> +          new PECOFFFileNode(ctx, ctx.allocateString(value)));
>>> +      inputElements.push_back(std::move(elem));
>>> +    }
>>>     }
>>>
>>>     // Add the libraries specified by /defaultlib unless they are already
>>> added
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>  --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by the Linux Foundation
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>>  _______________________________________________
>>>
>>> llvm-commits mailing list
>>>
>>> llvm-commits at cs.uiuc.edu
>>>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>>  --
>>>
>>> Rick Foos
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>>
>>>  _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>
>>
>> --
>> Rick Foos
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>>
>
>
> --
> Rick Foos
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131114/b495969b/attachment.html>