[LLVMdev] Automating Diagnostic Instrumentation

Thu Apr 8 02:42:02 PDT 2004

Vikram,

Thanks for the salient feedback.  This sounds _very_ interesting. I'm
reading Joel's thesis in my "spare" time. At 75 pages it might take some
time. I'll respond in more detail when I have a better understanding of
what Joel has done and what support is already in LLVM.

Reid.

On Wed, 2004-04-07 at 10:01, Vikram S. Adve wrote:
> Reid,
> 
> Adding this kind of instrumentation pass would be very valuable in 
> LLVM.  We don't have any such thing at present, and there could be 
> multiple uses for it.
> 
> Joel Stanley did an MS thesis last year that could complement this kind 
> of pass nicely.  Joel's thesis was on dynamic performance 
> instrumentation guided by explicit queries within the application (I 
> will forward you a copy).  Think about it as two things:
> 
> (1) A simple performance query language that allows queries to be 
> embedded within an application (e.g., "how many L2 cache misses does 
> the loop nest labeled X incur, in those instances when a moving average 
> cost for function Y is more than 3x of long-term average, i.e., the 
> function Y has been unusually slow").  The language allows the user to 
> define arbitrary "metrics," to specify routines that can be used to 
> measure those metrics, and then to query those metrics for arbitrary 
> points or intervals within the application.  A number of common 
> computational performance metrics and their measurement routines are 
> predefined, e.g., elapsed user/total/system time, L1/L2 cache misses, 
> TLB misses.  Many more predefined ones can be added, including OS, 
> networking, and other kinds of metrics.
> 
> The query language is actually implemented as a simple API.  Joel wrote 
> an LLVM pass that recognizes calls to this API, and replaces them with 
> initial calls to his runtime system.
> 
> (2) A sophisticated runtime system that dynamically inserts and removes 
> calls to instrumentation routines that actually do the measurement.  
> This is driven by the requirements of the actual queries, e.g., for the 
> example query above, you would insert instrumentation for L2 cache 
> misses around loop nest X.
> 
> Your automatic pass could potentially use Joel's runtime support to do 
> the actual work of inserting and removing instrumentation -- the pass 
> would only have to insert the appropriate queries in our query 
> "language" API.
> 
> Caveat: the runtime instrumentation library has only been lightly 
> tested and isn't robust yet.
> 
> --Vikram
> http://www.cs.uiuc.edu/~vadve
> http://llvm.cs.uiuc.edu/
> 
> On Apr 7, 2004, at 11:32 AM, Reid Spencer wrote:
> 
> > Dear List,
> >
> > I have some questions about some passes for LLVM that I'm thinking 
> > about
> > but first I need to give a little background ...
> >
> > One of the things I do in my "day job" (large IT system performance
> > analysis) is to figure out where to place diagnostic instrumentation
> > within an application. The instrumentation I'm talking about here is 
> > the
> > kind that is inserted into application code to capture things like
> > wall-clock latency, cpu time, number of i/o requests, etc.  For 
> > accurate
> > diagnosis, it is often necessary to collect that kind of data about
> > every occurrence of some event or event pair (like entry/exit of a
> > function).  However, this kind of instrumentation can have very high
> > overhead if used liberally. When an application is under heavy load, it
> > is critical to select the instrumentation points correctly so as to
> > minimize overhead and maximize the utility of the information provided.
> > Fortunately, there is a way to do this automatically. By constructing a
> > static call graph of the application, it is possible to discover the
> > "fan-out" points. Fan-out points are calls in the call chain that are
> > called from very few places (typically one) but directly or indirectly
> > call many functions. That is, some functions in an application are at
> > the top of the call chain (e.g. like "main" which implies all the
> > processing of the program) while others are at the bottom (like 
> > "strlen"
> > which implies no further calls and is a leaf node in the call chain). 
> > In
> > between these two extremes (root and leaf),  the call chain will fan-in
> > (like strlen) and fan-out (like main).  Architecturally, we can speak 
> > of
> > the fan-out points being the places in the application code where a
> > module boundary is being crossed.  By instrumenting these fan-out 
> > points
> > we can be assured that we're instrumenting things that (a) have
> > architectural significance (i.e. good quality of information) and (b)
> > imply enough processing that the overhead of instrumentation is
> > negligible.
> >
> > With the above kind of instrumentation in mind, I consider such
> > auto-instrumentation as "just another pass" in LLVM. That is, I believe
> > LLVM makes it pretty easy to do the call graph analysis, find the
> > "fan-out" points, and insert the instrumentation code.  So, given that
> > its relatively easy to do this, I have the following questions for the
> > list:
> >
> > (1) Would others find this kind of diagnostic instrumentation useful?
> >
> > (2) Would it be useful to turn the call graph data ino a pretty picture
> > via graphviz (both statically and dynamically) ?
> >
> > (3) How much of this auto-instrumentation pass is already written in
> > existing passes?
> >
> > (4) Are there other ways to achieve the same ends?
> >
> > (5) Can someone give me some tips on how to implement this?  It would 
> > be
> > my first LLVM pass.
> >
> > On a side note, there are source language constructs for which I'm
> > wanting to aggregate the data captured by the instrumentation..  Call
> > chains of functions are great but sometimes you can't see the forest 
> > for
> > the trees because the information is too dense. What is useful is to
> > aggregate the performance data at higher levels of abstraction. For
> > example, in a multi-threaded, object-oriented program you might want to
> > see aggregation at the process, thread, class, object instance, and
> > method levels. In order to instrument for these kinds of aggregated
> > views, I need to be able to provide some kind of "context" (process id,
> > thread id, class, etc.) in which a data point is captured.  This 
> > implies
> > that I need the instrumentation pass to understand some things about 
> > the
> > source-level language. and possibly capture information about the
> > environment the instrumentation application will run in.  
> > Unfortunately,
> > that means that the pass would either become language specific or
> > environment specific because I would have to ensure that the source
> > language compiler added the necessary constructs to the generated code
> > to provide the contextual information needed by the pass.  This raises 
> > a
> > couple questions more questions:
> >
> > (1) Is there be a general mechanism to communicate information between
> > source language compiler and LLVM pass? If there isn't, should there
> > be?  In the case I describe above it would be *highly* useful (IMO) to
> > have the source language compiler provide source level information for 
> > a
> > language independent pass to use later.
> >
> > (2) Is there any existing mechanism in LLVM for providing (1) directly?
> > What I'm thinking of is some kind of API  that the source language
> > compiler can use to add addtional information that any subsequent pass
> > might need to use.
> >
> > Reid.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________
Reid Spencer
President & CTO
eXtensible Systems, Inc.
rspencer at x10sys.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040408/874061bf/attachment.sig>