<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Nov 13, 2013 at 7:40 PM, Rick Foos <span dir="ltr"><<a href="mailto:rfoos@codeaurora.org" target="_blank">rfoos@codeaurora.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div class="im">
    <div>On 11/13/2013 06:19 PM, Sean Silva
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Wed, Nov 13, 2013 at 2:41 PM, Rick
            Foos <span dir="ltr"><<a href="mailto:rfoos@codeaurora.org" target="_blank">rfoos@codeaurora.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div>Sorry for the delay, <br>
                  <br>
                  Our problem with running the sanitizers is that the
                  load average running under Ninja reached 146 and a
                  short time after a system crash requiring calling
                  someone to power cycle the box...<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I'm curious what is causing so much load? All our tests
              are mostly single-threaded, so if only #cores jobs are
              spawned (or #cores + 2 which is what ninja uses when
              #cores > 2), there should only be #cores + 2 jobs
              running simultaneously (certainly not 146/32 ~4.5). Is lit
              spawning too many jobs?</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    A bare ninja command in the test step, so no -j or -l control.<div class="im"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>Does the machine have enough RAM?</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    24G RAM. 40Mb L2<div class="im"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> <br>
                  The address sanitizer by itself leaves a load average
                  40. This means the OS over 100% utilization, and is
                  thrashing a bit. Load Average doesn't say what exactly
                  is thrashing.<br>
                  <br>
                  Ninja supports make's -j, and -l options. The -l
                  maximum load average, is the key. <br>
                  <br>
                  The load average should be less than the total number
                  of cores (hyperthreads too) before Ninja launches
                  another task. <br>
                  <br>
                  A Load Average at or lower than 100%  technically
                  should benefit performance, and maximize throughput.
                  However, I will be happy if I don't have to call
                  someone to power cycle the server :)<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I don't think that's quite how it works. As long as you
              have enough RAM, the only performance loss due to having a
              bunch of jobs waiting is context switching overhead, but
              that can be minimized by either lowering the preempt timer
              rate (what is called HZ in linux; 100 which is common for
              servers doing batch jobs dilutes the overhead to basically
              nothing) or if you are running a recent kernel then you
              can arrange things to run tickless and then there will be
              essentially no overhead. If load is less than #cores, then
              you don't have a job running on every core, which means
              that those cores are essentially idle and you are losing
              performance. The other killer is jobs blocking on disk IO
              *with no other jobs to be scheduled in the meantime*;
              generally you have to keep load above 100% to avoid that
              problem.</div>
            <div><br>
            </div>
            <div>-- Sean Silva<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    ninja --help<br>
    usage: ninja [options] [targets...]<br>
    ...<br>
      -j N     run N jobs in parallel [default=10]<br>
      -l N     do not start new jobs if the load average is greater than
    N<br>
    <br>
    As far as what load average means:<br>
    <a href="http://serverfault.com/questions/251947/what-does-load-average-mean" target="_blank">http://serverfault.com/questions/251947/what-does-load-average-mean</a><br>
<a href="http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages" target="_blank">http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages</a><br>
    <br>
    Everything seems to say 100% load is when Loadaverage = number of
    Processors.<br></div></blockquote><div><br></div><div>This term "load" is only vaguely related to the colloquial meaning, so "100% load" should not be understood as "perfect" or "maximum". It's literally just the time-averaged number of jobs available to run. The bridge analogy in the second link is fairly accurate. Notice that even if you are at >100% load, the bridge is still being used at full capacity (as many cars as possible are crossing the bridge simultaneously). If load is >100%, then that might impact the *latency* for getting to a particular job (in the analogy: how long it takes for a particular car to get across the bridge *including the waiting time in the queue*), but for a batch operation like running tests that doesn't matter.</div>
<div>  </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
    <br>
    ----<br>
    While the Ninja build step seemed OK, -j10 and all, the test section
    seemed to be the problem.<br>
    <br>
    Ninja continuously launched the address measurement tasks with no
    limits.<br></div></blockquote><div><br></div><div>What "address measurement"?</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
    <br>
    When combined with a thread sanitizer doing the same thing,
    Loadaverage 146 followed by a crash.<br>
    <br>
     In my testing after -l is used, the load average is mostly below
    32. There are some other builders going on, so they are not
    controlled by loadaverage. My guess is that when all builders are
    throttled by loadaverage, it will be very close to 100% utilization
    when everything is running.<br>
    <br>
    Ninja for sure needs this control in the sanitizers. An experiment
    with Make is in order to prove the point.<div><div class="h5"><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> <br>
                  So the maximum load average of a 16 core machine with
                  hyperthreads is 32 (keeping it simple). This needs to
                  be passed to all make's and Ninja build steps on that
                  slave to maximize throughput.<br>
                  <br>
                  For now, I'm looking at a minimal patch to include
                  jobs and a new loadaverage variable for the
                  sanitizers. <br>
                  <br>
                  Longer term, all buildslaves should define maximum
                  loadaverage, and all make/ninja steps should pass -j,
                  and -l options.<br>
                  <br>
                  Best Regards,<br>
                  Rick
                  <div>
                    <div><br>
                      <br>
                      On 11/13/2013 11:21 AM, Sergey Matveev wrote:<br>
                    </div>
                  </div>
                </div>
                <div>
                  <div>
                    <blockquote type="cite">
                      <div dir="ltr">+kcc</div>
                      <div class="gmail_extra"><br>
                        <br>
                        <div class="gmail_quote">On Wed, Nov 13, 2013 at
                          6:41 AM, Shankar Easwaran <span dir="ltr"><<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Sorry
                            for another indirection. Rick foos is
                            working on it. I think there is some good
                            news here :)<br>
                            <br>
                            Cced Rick + adding Galina,Dmitri.<br>
                            <br>
                            Thanks<br>
                            <br>
                            Shankar Easwaran
                            <div>
                              <div><br>
                                <br>
                                On 11/12/2013 8:37 PM, Rui Ueyama wrote:<br>
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                  Shankar tried to set it up recently.<br>
                                  <br>
                                  <br>
                                  On Tue, Nov 12, 2013 at 6:31 PM, Sean
                                  Silva <<a href="mailto:silvas@purdue.edu" target="_blank">silvas@purdue.edu</a>>

                                  wrote:<br>
                                  <br>
                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                    Sanitizers?<br>
                                    <br>
                                    There have been a couple of these
                                    sorts of bugs recently... we really<br>
                                    ought to have some sanitizer bots...<br>
                                    <br>
                                    -- Sean Silva<br>
                                    <br>
                                    <br>
                                    On Tue, Nov 12, 2013 at 9:21 PM, Rui
                                    Ueyama <<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>>

                                    wrote:<br>
                                    <br>
                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                      Author: ruiu<br>
                                      Date: Tue Nov 12 20:21:51 2013<br>
                                      New Revision: 194545<br>
                                      <br>
                                      URL: <a href="http://llvm.org/viewvc/llvm-project?rev=194545&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=194545&view=rev</a><br>
                                      Log:<br>
                                      [PECOFF] Fix use-after-return.<br>
                                      <br>
                                      Modified:<br>
                                         
                                       lld/trunk/lib/Driver/WinLinkDriver.cpp<br>
                                      <br>
                                      Modified:
                                      lld/trunk/lib/Driver/WinLinkDriver.cpp<br>
                                      URL:<br>
                                      <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/Driver/WinLinkDriver.cpp?rev=194545&r1=194544&r2=194545&view=diff</a><br>

                                      <br>
==============================================================================<br>
                                      ---
                                      lld/trunk/lib/Driver/WinLinkDriver.cpp
                                      (original)<br>
                                      +++
                                      lld/trunk/lib/Driver/WinLinkDriver.cpp
                                      Tue Nov 12 20:21:51 2013<br>
                                      @@ -842,7 +842,7 @@
                                      WinLinkDriver::parse(int argc,
                                      const cha<br>
                                      <br>
                                            case OPT_INPUT:<br>
                                             
                                      inputElements.push_back(std::unique_ptr<InputElement>(<br>
                                      -          new PECOFFFileNode(ctx,
                                      inputArg->getValue())));<br>
                                      +          new PECOFFFileNode(ctx,<br>
ctx.allocateString(inputArg->getValue()))));<br>
                                              break;<br>
                                      <br>
                                        #define
                                      DEFINE_BOOLEAN_FLAG(name, setter)
                                            \<br>
                                      @@ -892,9 +892,11 @@
                                      WinLinkDriver::parse(int argc,
                                      const cha<br>
                                          // start with a hypen or a
                                      slash. This is not compatible with
                                      link.exe<br>
                                          // but useful for us to test
                                      lld on Unix.<br>
                                          if (llvm::opt::Arg *dashdash =
                                      parsedArgs->getLastArg(OPT_DASH_DASH))
                                      {<br>
                                      -    for (const StringRef value :
                                      dashdash->getValues())<br>
                                      -      inputElements.push_back(<br>
                                      -        
                                       std::unique_ptr<InputElement>(new
                                      PECOFFFileNode(ctx, value)));<br>
                                      +    for (const StringRef value :
                                      dashdash->getValues()) {<br>
                                      +    
                                       std::unique_ptr<InputElement>
                                      elem(<br>
                                      +          new PECOFFFileNode(ctx,
                                      ctx.allocateString(value)));<br>
                                      +    
                                       inputElements.push_back(std::move(elem));<br>
                                      +    }<br>
                                          }<br>
                                      <br>
                                          // Add the libraries specified
                                      by /defaultlib unless they are
                                      already<br>
                                      added<br>
                                      <br>
                                      <br>
_______________________________________________<br>
                                      llvm-commits mailing list<br>
                                      <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
                                      <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
                                      <br>
                                    </blockquote>
                                    <br>
                                  </blockquote>
                                </blockquote>
                                <br>
                                <br>
                              </div>
                            </div>
                            <span><font color="#888888"> -- <br>
                                Qualcomm Innovation Center, Inc. is a
                                member of Code Aurora Forum, hosted by
                                the Linux Foundation</font></span>
                            <div>
                              <div><br>
                                <br>
_______________________________________________<br>
                                llvm-commits mailing list<br>
                                <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
                                <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                      <br>
                      <fieldset></fieldset>
                      <br>
                      <pre>_______________________________________________
llvm-commits mailing list
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
</pre>
                    </blockquote>
                    <br>
                    <br>
                  </div>
                </div>
                <span><font color="#888888">
                    <pre cols="72">-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation</pre>
                  </font></span></div>
              <br>
              _______________________________________________<br>
              llvm-commits mailing list<br>
              <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
              <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
    <pre cols="72">-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation</pre>
  </div></div></div>

</blockquote></div><br></div></div>