[lld] r194545 - [PECOFF] Fix use-after-return.

Wed Nov 13 17:44:24 PST 2013

On 11/13/2013 07:12 PM, Sean Silva wrote:
>
>
>
> On Wed, Nov 13, 2013 at 7:40 PM, Rick Foos <rfoos at codeaurora.org 
> <mailto:rfoos at codeaurora.org>> wrote:
>
>     On 11/13/2013 06:19 PM, Sean Silva wrote:
>>
>>
>>
>>     On Wed, Nov 13, 2013 at 2:41 PM, Rick Foos <rfoos at codeaurora.org
>>     <mailto:rfoos at codeaurora.org>> wrote:
>>
>>         Sorry for the delay,
>>
>>         Our problem with running the sanitizers is that the load
>>         average running under Ninja reached 146 and a short time
>>         after a system crash requiring calling someone to power cycle
>>         the box...
>>
>>
>>     I'm curious what is causing so much load? All our tests are
>>     mostly single-threaded, so if only #cores jobs are spawned (or
>>     #cores + 2 which is what ninja uses when #cores > 2), there
>>     should only be #cores + 2 jobs running simultaneously (certainly
>>     not 146/32 ~4.5). Is lit spawning too many jobs?
>>
>     A bare ninja command in the test step, so no -j or -l control.
>
>>     Does the machine have enough RAM?
>>
>     24G RAM. 40Mb L2
>
>>
>>
>>         The address sanitizer by itself leaves a load average 40.
>>         This means the OS over 100% utilization, and is thrashing a
>>         bit. Load Average doesn't say what exactly is thrashing.
>>
>>         Ninja supports make's -j, and -l options. The -l maximum load
>>         average, is the key.
>>
>>         The load average should be less than the total number of
>>         cores (hyperthreads too) before Ninja launches another task.
>>
>>         A Load Average at or lower than 100% technically should
>>         benefit performance, and maximize throughput. However, I will
>>         be happy if I don't have to call someone to power cycle the
>>         server :)
>>
>>
>>     I don't think that's quite how it works. As long as you have
>>     enough RAM, the only performance loss due to having a bunch of
>>     jobs waiting is context switching overhead, but that can be
>>     minimized by either lowering the preempt timer rate (what is
>>     called HZ in linux; 100 which is common for servers doing batch
>>     jobs dilutes the overhead to basically nothing) or if you are
>>     running a recent kernel then you can arrange things to run
>>     tickless and then there will be essentially no overhead. If load
>>     is less than #cores, then you don't have a job running on every
>>     core, which means that those cores are essentially idle and you
>>     are losing performance. The other killer is jobs blocking on disk
>>     IO *with no other jobs to be scheduled in the meantime*;
>>     generally you have to keep load above 100% to avoid that problem.
>>
>>     -- Sean Silva
>     ninja --help
>     usage: ninja [options] [targets...]
>     ...
>       -j N     run N jobs in parallel [default=10]
>       -l N     do not start new jobs if the load average is greater than N
>
>     As far as what load average means:
>     http://serverfault.com/questions/251947/what-does-load-average-mean
>     http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
>
>     Everything seems to say 100% load is when Loadaverage = number of
>     Processors.
>
>
> This term "load" is only vaguely related to the colloquial meaning, so 
> "100% load" should not be understood as "perfect" or "maximum". It's 
> literally just the time-averaged number of jobs available to run. The 
> bridge analogy in the second link is fairly accurate. Notice that even 
> if you are at >100% load, the bridge is still being used at full 
> capacity (as many cars as possible are crossing the bridge 
> simultaneously). If load is >100%, then that might impact the 
> *latency* for getting to a particular job (in the analogy: how long it 
> takes for a particular car to get across the bridge *including the 
> waiting time in the queue*), but for a batch operation like running 
> tests that doesn't matter.
>
>
>     ----
>     While the Ninja build step seemed OK, -j10 and all, the test
>     section seemed to be the problem.
>
>     Ninja continuously launched the address measurement tasks with no
>     limits.
>
>
> What "address measurement"?
>
Asan-x86_64-xxx tasks.

Load Average failed...Somebody launched 85 python tasks, so I took the 
nice's off of ninja.

17:28:57 up 2 days,  7:47,  1 user,  load average: 81.63, 73.15, 51.22

better now.

17:35:25 up 2 days,  7:54,  1 user,  load average: 33.09, 48.52, 48.32

Going to need to run this overnight, and see if I have a good zorg setup 
for this.

Asan steps are running, and the throttle seems to work now. (Originally 
there were 100 of these tasks, now 3 with the load average staying in 
the 30's).

187 clang tasks so the system is not being lazy...It's actually limiting 
the Asan now as desired.

If this holds up tonight, a zorg patch in the (my) morning.

Cheers,
Rick

> -- Sean Silva
>
>
>     When combined with a thread sanitizer doing the same thing,
>     Loadaverage 146 followed by a crash.
>
>      In my testing after -l is used, the load average is mostly below
>     32. There are some other builders going on, so they are not
>     controlled by loadaverage. My guess is that when all builders are
>     throttled by loadaverage, it will be very close to 100%
>     utilization when everything is running.
>
>     Ninja for sure needs this control in the sanitizers. An experiment
>     with Make is in order to prove the point.
>
>
>>
>>         So the maximum load average of a 16 core machine with
>>         hyperthreads is 32 (keeping it simple). This needs to be
>>         passed to all make's and Ninja build steps on that slave to
>>         maximize throughput.
>>
>>         For now, I'm looking at a minimal patch to include jobs and a
>>         new loadaverage variable for the sanitizers.
>>
>>         Longer term, all buildslaves should define maximum
>>         loadaverage, and all make/ninja steps should pass -j, and -l
>>         options.
>>
>>         Best Regards,
>>         Rick
>>
>>
>>         On 11/13/2013 11:21 AM, Sergey Matveev wrote:
>>>         +kcc
>>>
>>>
>>>         On Wed, Nov 13, 2013 at 6:41 AM, Shankar Easwaran
>>>         <shankare at codeaurora.org <mailto:shankare at codeaurora.org>>
>>>         wrote:
>>>
>>>             Sorry for another indirection. Rick foos is working on
>>>             it. I think there is some good news here :)
>>>
>>>             Cced Rick + adding Galina,Dmitri.
>>>
>>>             Thanks
>>>
>>>             Shankar Easwaran
>>>
>>>
>>>             On 11/12/2013 8:37 PM, Rui Ueyama wrote:
>>>
>>>                 Shankar tried to set it up recently.
>>>
>>>
>>>                 On Tue, Nov 12, 2013 at 6:31 PM, Sean Silva
>>>                 <silvas at purdue.edu <mailto:silvas at purdue.edu>> wrote:
>>>
>>>                     Sanitizers?
>>>
>>>                     There have been a couple of these sorts of bugs
>>>                     recently... we really
>>>                     ought to have some sanitizer bots...
>>>
>>>                     -- Sean Silva
>>>
>>>
>>>                     On Tue, Nov 12, 2013 at 9:21 PM, Rui Ueyama
>>>                     <ruiu at google.com <mailto:ruiu at google.com>> wrote:
>>>
>>>                         Author: ruiu
>>>                         Date: Tue Nov 12 20:21:51 2013
>>>                         New Revision: 194545
>>>
>>>                         URL:
>>>                         http://llvm.org/viewvc/llvm-project?rev=194545&view=rev
>>>                         Log:
>>>                         [PECOFF] Fix use-after-return.
>>>
>>>                         Modified:
>>>                          lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>
>>>                         Modified: lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>                         URL:
>>>                         http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff
>>>
>>>                         ==============================================================================
>>>                         --- lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>                         (original)
>>>                         +++ lld/trunk/lib/Driver/WinLinkDriver.cpp
>>>                         Tue Nov 12 20:21:51 2013
>>>                         @@ -842,7 +842,7 @@ WinLinkDriver::parse(int
>>>                         argc, const cha
>>>
>>>                               case OPT_INPUT:
>>>                         inputElements.push_back(std::unique_ptr<InputElement>(
>>>                         -          new PECOFFFileNode(ctx,
>>>                         inputArg->getValue())));
>>>                         +          new PECOFFFileNode(ctx,
>>>                         ctx.allocateString(inputArg->getValue()))));
>>>                                 break;
>>>
>>>                           #define DEFINE_BOOLEAN_FLAG(name, setter)
>>>                               \
>>>                         @@ -892,9 +892,11 @@
>>>                         WinLinkDriver::parse(int argc, const cha
>>>                             // start with a hypen or a slash. This
>>>                         is not compatible with link.exe
>>>                             // but useful for us to test lld on Unix.
>>>                             if (llvm::opt::Arg *dashdash =
>>>                         parsedArgs->getLastArg(OPT_DASH_DASH)) {
>>>                         -    for (const StringRef value :
>>>                         dashdash->getValues())
>>>                         -  inputElements.push_back(
>>>                         -  std::unique_ptr<InputElement>(new
>>>                         PECOFFFileNode(ctx, value)));
>>>                         +    for (const StringRef value :
>>>                         dashdash->getValues()) {
>>>                         +  std::unique_ptr<InputElement> elem(
>>>                         +          new PECOFFFileNode(ctx,
>>>                         ctx.allocateString(value)));
>>>                         +  inputElements.push_back(std::move(elem));
>>>                         +    }
>>>                             }
>>>
>>>                             // Add the libraries specified by
>>>                         /defaultlib unless they are already
>>>                         added
>>>
>>>
>>>                         _______________________________________________
>>>                         llvm-commits mailing list
>>>                         llvm-commits at cs.uiuc.edu
>>>                         <mailto:llvm-commits at cs.uiuc.edu>
>>>                         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Qualcomm Innovation Center, Inc. is a member of Code
>>>             Aurora Forum, hosted by the Linux Foundation
>>>
>>>
>>>             _______________________________________________
>>>             llvm-commits mailing list
>>>             llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>             http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         llvm-commits mailing list
>>>         llvm-commits at cs.uiuc.edu  <mailto:llvm-commits at cs.uiuc.edu>
>>>         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>         -- 
>>         Rick Foos
>>         Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>>
>>         _______________________________________________
>>         llvm-commits mailing list
>>         llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
>     -- 
>     Rick Foos
>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
>

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131113/e221408e/attachment.html>