[LLVMdev] Automating Diagnostic Instrumentation
Reid Spencer
reid at x10sys.com
Thu Apr 8 02:42:02 PDT 2004
Vikram,
Thanks for the salient feedback. This sounds _very_ interesting. I'm
reading Joel's thesis in my "spare" time. At 75 pages it might take some
time. I'll respond in more detail when I have a better understanding of
what Joel has done and what support is already in LLVM.
Reid.
On Wed, 2004-04-07 at 10:01, Vikram S. Adve wrote:
> Reid,
>
> Adding this kind of instrumentation pass would be very valuable in
> LLVM. We don't have any such thing at present, and there could be
> multiple uses for it.
>
> Joel Stanley did an MS thesis last year that could complement this kind
> of pass nicely. Joel's thesis was on dynamic performance
> instrumentation guided by explicit queries within the application (I
> will forward you a copy). Think about it as two things:
>
> (1) A simple performance query language that allows queries to be
> embedded within an application (e.g., "how many L2 cache misses does
> the loop nest labeled X incur, in those instances when a moving average
> cost for function Y is more than 3x of long-term average, i.e., the
> function Y has been unusually slow"). The language allows the user to
> define arbitrary "metrics," to specify routines that can be used to
> measure those metrics, and then to query those metrics for arbitrary
> points or intervals within the application. A number of common
> computational performance metrics and their measurement routines are
> predefined, e.g., elapsed user/total/system time, L1/L2 cache misses,
> TLB misses. Many more predefined ones can be added, including OS,
> networking, and other kinds of metrics.
>
> The query language is actually implemented as a simple API. Joel wrote
> an LLVM pass that recognizes calls to this API, and replaces them with
> initial calls to his runtime system.
>
> (2) A sophisticated runtime system that dynamically inserts and removes
> calls to instrumentation routines that actually do the measurement.
> This is driven by the requirements of the actual queries, e.g., for the
> example query above, you would insert instrumentation for L2 cache
> misses around loop nest X.
>
> Your automatic pass could potentially use Joel's runtime support to do
> the actual work of inserting and removing instrumentation -- the pass
> would only have to insert the appropriate queries in our query
> "language" API.
>
> Caveat: the runtime instrumentation library has only been lightly
> tested and isn't robust yet.
>
> --Vikram
> http://www.cs.uiuc.edu/~vadve
> http://llvm.cs.uiuc.edu/
>
> On Apr 7, 2004, at 11:32 AM, Reid Spencer wrote:
>
> > Dear List,
> >
> > I have some questions about some passes for LLVM that I'm thinking
> > about
> > but first I need to give a little background ...
> >
> > One of the things I do in my "day job" (large IT system performance
> > analysis) is to figure out where to place diagnostic instrumentation
> > within an application. The instrumentation I'm talking about here is
> > the
> > kind that is inserted into application code to capture things like
> > wall-clock latency, cpu time, number of i/o requests, etc. For
> > accurate
> > diagnosis, it is often necessary to collect that kind of data about
> > every occurrence of some event or event pair (like entry/exit of a
> > function). However, this kind of instrumentation can have very high
> > overhead if used liberally. When an application is under heavy load, it
> > is critical to select the instrumentation points correctly so as to
> > minimize overhead and maximize the utility of the information provided.
> > Fortunately, there is a way to do this automatically. By constructing a
> > static call graph of the application, it is possible to discover the
> > "fan-out" points. Fan-out points are calls in the call chain that are
> > called from very few places (typically one) but directly or indirectly
> > call many functions. That is, some functions in an application are at
> > the top of the call chain (e.g. like "main" which implies all the
> > processing of the program) while others are at the bottom (like
> > "strlen"
> > which implies no further calls and is a leaf node in the call chain).
> > In
> > between these two extremes (root and leaf), the call chain will fan-in
> > (like strlen) and fan-out (like main). Architecturally, we can speak
> > of
> > the fan-out points being the places in the application code where a
> > module boundary is being crossed. By instrumenting these fan-out
> > points
> > we can be assured that we're instrumenting things that (a) have
> > architectural significance (i.e. good quality of information) and (b)
> > imply enough processing that the overhead of instrumentation is
> > negligible.
> >
> > With the above kind of instrumentation in mind, I consider such
> > auto-instrumentation as "just another pass" in LLVM. That is, I believe
> > LLVM makes it pretty easy to do the call graph analysis, find the
> > "fan-out" points, and insert the instrumentation code. So, given that
> > its relatively easy to do this, I have the following questions for the
> > list:
> >
> > (1) Would others find this kind of diagnostic instrumentation useful?
> >
> > (2) Would it be useful to turn the call graph data ino a pretty picture
> > via graphviz (both statically and dynamically) ?
> >
> > (3) How much of this auto-instrumentation pass is already written in
> > existing passes?
> >
> > (4) Are there other ways to achieve the same ends?
> >
> > (5) Can someone give me some tips on how to implement this? It would
> > be
> > my first LLVM pass.
> >
> > On a side note, there are source language constructs for which I'm
> > wanting to aggregate the data captured by the instrumentation.. Call
> > chains of functions are great but sometimes you can't see the forest
> > for
> > the trees because the information is too dense. What is useful is to
> > aggregate the performance data at higher levels of abstraction. For
> > example, in a multi-threaded, object-oriented program you might want to
> > see aggregation at the process, thread, class, object instance, and
> > method levels. In order to instrument for these kinds of aggregated
> > views, I need to be able to provide some kind of "context" (process id,
> > thread id, class, etc.) in which a data point is captured. This
> > implies
> > that I need the instrumentation pass to understand some things about
> > the
> > source-level language. and possibly capture information about the
> > environment the instrumentation application will run in.
> > Unfortunately,
> > that means that the pass would either become language specific or
> > environment specific because I would have to ensure that the source
> > language compiler added the necessary constructs to the generated code
> > to provide the contextual information needed by the pass. This raises
> > a
> > couple questions more questions:
> >
> > (1) Is there be a general mechanism to communicate information between
> > source language compiler and LLVM pass? If there isn't, should there
> > be? In the case I describe above it would be *highly* useful (IMO) to
> > have the source language compiler provide source level information for
> > a
> > language independent pass to use later.
> >
> > (2) Is there any existing mechanism in LLVM for providing (1) directly?
> > What I'm thinking of is some kind of API that the source language
> > compiler can use to add addtional information that any subsequent pass
> > might need to use.
> >
> > Reid.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev
_______________________
Reid Spencer
President & CTO
eXtensible Systems, Inc.
rspencer at x10sys.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040408/874061bf/attachment.sig>
More information about the llvm-dev
mailing list