[lld] r194545 - [PECOFF] Fix use-after-return.
Rick Foos
rfoos at codeaurora.org
Wed Nov 13 17:44:24 PST 2013
On 11/13/2013 07:12 PM, Sean Silva wrote:
>
>
>
> On Wed, Nov 13, 2013 at 7:40 PM, Rick Foos <rfoos at codeaurora.org
> <mailto:rfoos at codeaurora.org>> wrote:
>
> On 11/13/2013 06:19 PM, Sean Silva wrote:
>>
>>
>>
>> On Wed, Nov 13, 2013 at 2:41 PM, Rick Foos <rfoos at codeaurora.org
>> <mailto:rfoos at codeaurora.org>> wrote:
>>
>> Sorry for the delay,
>>
>> Our problem with running the sanitizers is that the load
>> average running under Ninja reached 146 and a short time
>> after a system crash requiring calling someone to power cycle
>> the box...
>>
>>
>> I'm curious what is causing so much load? All our tests are
>> mostly single-threaded, so if only #cores jobs are spawned (or
>> #cores + 2 which is what ninja uses when #cores > 2), there
>> should only be #cores + 2 jobs running simultaneously (certainly
>> not 146/32 ~4.5). Is lit spawning too many jobs?
>>
> A bare ninja command in the test step, so no -j or -l control.
>
>> Does the machine have enough RAM?
>>
> 24G RAM. 40Mb L2
>
>>
>>
>> The address sanitizer by itself leaves a load average 40.
>> This means the OS over 100% utilization, and is thrashing a
>> bit. Load Average doesn't say what exactly is thrashing.
>>
>> Ninja supports make's -j, and -l options. The -l maximum load
>> average, is the key.
>>
>> The load average should be less than the total number of
>> cores (hyperthreads too) before Ninja launches another task.
>>
>> A Load Average at or lower than 100% technically should
>> benefit performance, and maximize throughput. However, I will
>> be happy if I don't have to call someone to power cycle the
>> server :)
>>
>>
>> I don't think that's quite how it works. As long as you have
>> enough RAM, the only performance loss due to having a bunch of
>> jobs waiting is context switching overhead, but that can be
>> minimized by either lowering the preempt timer rate (what is
>> called HZ in linux; 100 which is common for servers doing batch
>> jobs dilutes the overhead to basically nothing) or if you are
>> running a recent kernel then you can arrange things to run
>> tickless and then there will be essentially no overhead. If load
>> is less than #cores, then you don't have a job running on every
>> core, which means that those cores are essentially idle and you
>> are losing performance. The other killer is jobs blocking on disk
>> IO *with no other jobs to be scheduled in the meantime*;
>> generally you have to keep load above 100% to avoid that problem.
>>
>> -- Sean Silva
> ninja --help
> usage: ninja [options] [targets...]
> ...
> -j N run N jobs in parallel [default=10]
> -l N do not start new jobs if the load average is greater than N
>
> As far as what load average means:
> http://serverfault.com/questions/251947/what-does-load-average-mean
> http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
>
> Everything seems to say 100% load is when Loadaverage = number of
> Processors.
>
>
> This term "load" is only vaguely related to the colloquial meaning, so
> "100% load" should not be understood as "perfect" or "maximum". It's
> literally just the time-averaged number of jobs available to run. The
> bridge analogy in the second link is fairly accurate. Notice that even
> if you are at >100% load, the bridge is still being used at full
> capacity (as many cars as possible are crossing the bridge
> simultaneously). If load is >100%, then that might impact the
> *latency* for getting to a particular job (in the analogy: how long it
> takes for a particular car to get across the bridge *including the
> waiting time in the queue*), but for a batch operation like running
> tests that doesn't matter.
>
>
> ----
> While the Ninja build step seemed OK, -j10 and all, the test
> section seemed to be the problem.
>
> Ninja continuously launched the address measurement tasks with no
> limits.
>
>
> What "address measurement"?
>
Asan-x86_64-xxx tasks.
Load Average failed...Somebody launched 85 python tasks, so I took the
nice's off of ninja.
17:28:57 up 2 days, 7:47, 1 user, load average: 81.63, 73.15, 51.22
better now.
17:35:25 up 2 days, 7:54, 1 user, load average: 33.09, 48.52, 48.32
Going to need to run this overnight, and see if I have a good zorg setup
for this.
Asan steps are running, and the throttle seems to work now. (Originally
there were 100 of these tasks, now 3 with the load average staying in
the 30's).
187 clang tasks so the system is not being lazy...It's actually limiting
the Asan now as desired.
If this holds up tonight, a zorg patch in the (my) morning.
Cheers,
Rick
> -- Sean Silva
>
>
> When combined with a thread sanitizer doing the same thing,
> Loadaverage 146 followed by a crash.
>
> In my testing after -l is used, the load average is mostly below
> 32. There are some other builders going on, so they are not
> controlled by loadaverage. My guess is that when all builders are
> throttled by loadaverage, it will be very close to 100%
> utilization when everything is running.
>
> Ninja for sure needs this control in the sanitizers. An experiment
> with Make is in order to prove the point.
>
>
>>
>> So the maximum load average of a 16 core machine with
>> hyperthreads is 32 (keeping it simple). This needs to be
>> passed to all make's and Ninja build steps on that slave to
>> maximize throughput.
>>
>> For now, I'm looking at a minimal patch to include jobs and a
>> new loadaverage variable for the sanitizers.
>>
>> Longer term, all buildslaves should define maximum
>> loadaverage, and all make/ninja steps should pass -j, and -l
>> options.
>>
>> Best Regards,
>> Rick
>>
>>
>> On 11/13/2013 11:21 AM, Sergey Matveev wrote:
>>> +kcc
>>>
>>>
>>> On Wed, Nov 13, 2013 at 6:41 AM, Shankar Easwaran
>>> <shankare at codeaurora.org <mailto:shankare at codeaurora.org>>
>>> wrote:
>>>
>>> Sorry for another indirection. Rick foos is working on
>>> it. I think there is some good news here :)
>>>
>>> Cced Rick + adding Galina,Dmitri.
>>>
>>> Thanks
>>>
>>> Shankar Easwaran
>>>
>>>
>>> On 11/12/2013 8:37 PM, Rui Ueyama wrote:
>>>
>>> Shankar tried to set it up recently.
>>>
>>>
>>> On Tue, Nov 12, 2013 at 6:31 PM, Sean Silva
>>> <silvas at purdue.edu <mailto:silvas at purdue.edu>> wrote:
>>>
>>> Sanitizers?
>>>
>>> There have been a couple of these sorts of bugs
>>> recently... we really
>>> ought to have some sanitizer bots...
>>>
>>> -- Sean Silva
>>>
>>>
>>> On Tue, Nov 12, 2013 at 9:21 PM, Rui Ueyama
>>> <ruiu at google.com <mailto:ruiu at google.com>> wrote:
>>>
>>> Author: ruiu
>>> Date: Tue Nov 12 20:21:51 2013
>>> New Revision: 194545
>>>
>>> URL:
>>> http://llvm.org/viewvc/llvm-project?rev=194545&view=rev
>>> Log:
>>> [PECOFF] Fix use-after-return.
>>>
>>> Modified:
>>> lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>
>>> Modified: lld/trunk/lib/Driver/WinLinkDriver.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff
>>>
>>> ==============================================================================
>>> --- lld/trunk/lib/Driver/WinLinkDriver.cpp
>>> (original)
>>> +++ lld/trunk/lib/Driver/WinLinkDriver.cpp
>>> Tue Nov 12 20:21:51 2013
>>> @@ -842,7 +842,7 @@ WinLinkDriver::parse(int
>>> argc, const cha
>>>
>>> case OPT_INPUT:
>>> inputElements.push_back(std::unique_ptr<InputElement>(
>>> - new PECOFFFileNode(ctx,
>>> inputArg->getValue())));
>>> + new PECOFFFileNode(ctx,
>>> ctx.allocateString(inputArg->getValue()))));
>>> break;
>>>
>>> #define DEFINE_BOOLEAN_FLAG(name, setter)
>>> \
>>> @@ -892,9 +892,11 @@
>>> WinLinkDriver::parse(int argc, const cha
>>> // start with a hypen or a slash. This
>>> is not compatible with link.exe
>>> // but useful for us to test lld on Unix.
>>> if (llvm::opt::Arg *dashdash =
>>> parsedArgs->getLastArg(OPT_DASH_DASH)) {
>>> - for (const StringRef value :
>>> dashdash->getValues())
>>> - inputElements.push_back(
>>> - std::unique_ptr<InputElement>(new
>>> PECOFFFileNode(ctx, value)));
>>> + for (const StringRef value :
>>> dashdash->getValues()) {
>>> + std::unique_ptr<InputElement> elem(
>>> + new PECOFFFileNode(ctx,
>>> ctx.allocateString(value)));
>>> + inputElements.push_back(std::move(elem));
>>> + }
>>> }
>>>
>>> // Add the libraries specified by
>>> /defaultlib unless they are already
>>> added
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code
>>> Aurora Forum, hosted by the Linux Foundation
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>> --
>> Rick Foos
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
> --
> Rick Foos
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
>
--
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131113/e221408e/attachment.html>
More information about the llvm-commits
mailing list