<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    This sounds interesting.  I've got a couple of questions about the

    cache fragmentation tool and the working set measurement tool.<br>

    <br>

    <div class="moz-cite-prefix">On 4/17/2016 4:46 PM, Derek Bruening

      via llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CAO1ikSYQeqaVjiGuQ-mzfVsnKBKKd5cKvKMOwj1U92ho+ucCGg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>TL;DR: We plan to build a suite of compiler-based dynamic

          instrumentation tools for analyzing targeted performance

          problems.  These tools will all live under a new

          "EfficiencySanitizer" (or "esan") sanitizer umbrella, as they

          will share significant portions of their implementations.</div>

        <div><br>

        </div>

        <div>====================</div>

        <div>Motivation</div>

        <div>====================</div>

        <div><br>

        </div>

        <div>Our goal is to build a suite of dynamic instrumentation

          tools for analyzing particular performance problems that are

          difficult to evaluate using other profiling methods.  Modern

          hardware performance counters provide insight into where time

          is spent and when micro-architectural events such as cache

          misses are occurring, but they are of limited effectiveness

          for contextual analysis: it is not easy to answer *why* a

          cache miss occurred.</div>

        <div><br>

        </div>

        <div>Examples of tools that we have planned include: identifying

          wasted or redundant computation, identifying cache

          fragmentation, and measuring working sets.  See more details

          on these below.</div>

        <div><br>

        </div>

        <div>====================</div>

        <div>Approach</div>

        <div>====================</div>

        <div><br>

        </div>

        <div>We believe that tools with overhead beyond about 5x are

          simply too heavyweight to easily apply to large,

          industrial-sized applications running real-world workloads. 

          Our goal is for our tools to gather useful information with

          overhead less than 5x, and ideally closer to 3x, to facilitate

          deployment.  We would prefer to trade off accuracy and build a

          less-accurate tool below our overhead ceiling than to build a

          high-accuracy but slow tool.  We hope to hit a sweet spot of

          tools that gather trace-based contextual information not

          feasible with pure sampling yet are still practical to deploy.</div>

        <div><br>

        </div>

        <div>In a similar vein, we would prefer a targeted tool that

          analyzes one particular aspect of performance with low

          overhead than a more general tool that can answer more

          questions but has high overhead.</div>

        <div><br>

        </div>

        <div>Dynamic binary instrumentation is one option for these

          types of tools, but typically compiler-based instrumentation

          provides better performance, and we intend to focus only on

          analyzing applications for which source code is available. 

          Studying instruction cache behavior with compiler

          instrumentation can be challenging, however, so we plan to at

          least initially focus on data performance.</div>

        <div><br>

        </div>

        <div>Many of our planned tools target specific performance

          issues with data accesses.  They employ the technique of

          *shadow memory* to store metadata about application data

          references, using the compiler to instrument loads and stores

          with code to update the shadow memory.  A companion runtime

          library intercepts libc calls if necessary to update shadow

          memory on non-application data references.  The runtime

          library also intercepts heap allocations and other key events

          in order to perform its analyses.  This is all very similar to

          how existing sanitizers such as AddressSanitizer,

          ThreadSanitizer, MemorySanitizer, etc. operate today.</div>

        <div><br>

        </div>

        <div>====================</div>

        <div>Example Tools</div>

        <div>====================</div>

        <div><br>

        </div>

        <div>We have several initial tools that we plan to build.  These

          are not necessarily novel ideas on their own: some of these

          have already been explored in academia.  The idea is to create

          practical, low-overhead, robust, and publicly available

          versions of these tools.</div>

        <div><br>

        </div>

        <div>*Cache fragmentation*: this tool gather data structure

          field hotness information, looking for data layout

          optimization opportunities by grouping hot fields together to

          avoid data cache fragmentation.  Future enhancements may add

          field affinity information if it can be computed with low

          enough overhead.</div>

      </div>

    </blockquote>

    I can imagine vaguely imagine how this data would be acquired, but

    I'm more interested in what analysis is provided by the tool, and

    how this information would be presented to a user.  Would it be a

    flat list of classes, sorted by number of accesses, with each field

    annotated by number of accesses?  Or is there some other kind of

    presentation planned?  Maybe some kind of weighting for classes with

    frequent cache misses?<br>

    <br>

    <blockquote

cite="mid:CAO1ikSYQeqaVjiGuQ-mzfVsnKBKKd5cKvKMOwj1U92ho+ucCGg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>*Working set measurement*: this tool measures the data

          working set size of an application at each snapshot during

          execution.  It can help to understand phased behavior as well

          as providing basic direction for further effort by the

          developer: e.g., knowing whether the working set is close to

          fitting in current L3 caches or is many times larger can help

          determine where to spend effort.</div>

      </div>

    </blockquote>

    I think my questions here are basically the reverse of my prior

    questions.  I can imagine the presentation ( a graph with time on

    the X axis, working set measurement on the Y axis, with some markers

    highlighting key execution points).  I'm not sure how the data

    collection works though, or even really what is being measured.  Are

    you planning on counting the number of data bytes / data cache lines

    used during each time period?  For the purposes of this tool, when

    is data brought into the working set and when is data evicted from

    the working set?<br>

    <br>

    <blockquote

cite="mid:CAO1ikSYQeqaVjiGuQ-mzfVsnKBKKd5cKvKMOwj1U92ho+ucCGg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>*Dead store detection*: this tool identifies dead stores

          (write-after-write patterns with no intervening read) as well

          as redundant stores (writes of the same value already in

          memory).  Xref the Deadspy paper from CGO 2012.</div>

        <div><br>

        </div>

        <div>*Single-reference*: this tool identifies data cache lines

          brought in but only read once.  These could be candidates for

          non-temporal loads.</div>

        <div><br>

        </div>

        <div>====================</div>

        <div>EfficiencySanitizer</div>

        <div>====================</div>

        <div><br>

        </div>

        <div>We are proposing the name EfficiencySanitizer, or "esan"

          for short, to refer to this suite of dynamic instrumentation

          tools for improving program efficiency.  As we have a number

          of different tools that share quite a bit of their

          implementation we plan to consider them sub-tools under the

          EfficiencySanitizer umbrella, rather than adding a whole bunch

          of separate instrumentation and runtime library components.</div>

        <div><br>

        </div>

        <div>While these tools are not addressing correctness issues

          like other sanitizers, they will be sharing a lot of the

          existing sanitizer runtime library support code.  Furthermore,

          users are already familiar with the sanitizer brand, and it

          seems better to extend that concept rather than add some new

          term.</div>

        <div><br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

</pre>

  </body>

</html>