[llvm] r309836 - Xray docs with description of Flight Data Recorder binary format.
Keith Wyss via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 2 10:27:20 PDT 2017
Author: kpw
Date: Wed Aug 2 10:27:20 2017
New Revision: 309836
URL: http://llvm.org/viewvc/llvm-project?rev=309836&view=rev
Log:
Xray docs with description of Flight Data Recorder binary format.
Summary:
Adding a new restructuredText file to document the trace format produced with
an FDR mode handler and read by llvm-xray toolset.
Reviewers: dberris, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36041
Added:
llvm/trunk/docs/XRayFDRFormat.rst
Modified:
llvm/trunk/docs/XRay.rst
llvm/trunk/lib/XRay/Trace.cpp
Modified: llvm/trunk/docs/XRay.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/XRay.rst?rev=309836&r1=309835&r2=309836&view=diff
==============================================================================
--- llvm/trunk/docs/XRay.rst (original)
+++ llvm/trunk/docs/XRay.rst Wed Aug 2 10:27:20 2017
@@ -196,6 +196,9 @@ on your application, you may set the ``x
``XRAY_OPTIONS`` environment variable (while also optionally setting the
``xray_naive_log`` to ``false``).
+When the buffers are flushed to disk, the result is a binary trace format
+described by `XRay FDR format <XRayFDRFormat.html>`_
+
When FDR mode is on, it will keep writing and recycling memory buffers until
the logging implementation is finalized -- at which point it can be flushed and
re-initialised later. To do this programmatically, we follow the workflow
Added: llvm/trunk/docs/XRayFDRFormat.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/XRayFDRFormat.rst?rev=309836&view=auto
==============================================================================
--- llvm/trunk/docs/XRayFDRFormat.rst (added)
+++ llvm/trunk/docs/XRayFDRFormat.rst Wed Aug 2 10:27:20 2017
@@ -0,0 +1,394 @@
+======================================
+XRay Flight Data Recorder Trace Format
+======================================
+
+:Version: 1 as of 2017-07-20
+
+.. contents::
+ :local:
+
+
+Introduction
+============
+
+When gathering XRay traces in Flight Data Recorder mode, each thread of an
+application will claim buffers to fill with trace data, which at some point
+is finalized and flushed.
+
+A goal of the profiler is to minimize overhead, so the flushed data directly
+corresponds to the buffer.
+
+This document describes the format of a trace file.
+
+
+General
+=======
+
+Each trace file corresponds to a sequence of events in a particular thread.
+
+The file has a header followed by a sequence of discriminated record types.
+
+The endianess of byte fields matches the endianess of the platform which
+produced the trace file.
+
+
+Header Section
+==============
+
+A trace file begins with a 32 byte header.
+
++-------------------+-----------------+----------------------------------------+
+| Field | Size (bytes) | Description |
++===================+=================+========================================+
+| version | ``2`` | Anticipates versioned readers. This |
+| | | document describes the format when |
+| | | version == 1 |
++-------------------+-----------------+----------------------------------------+
+| type | ``2`` | An enumeration encoding the type of |
+| | | trace. Flight Data Recorder mode |
+| | | traces have type == 1 |
++-------------------+-----------------+----------------------------------------+
+| bitfield | ``4`` | Holds parameters that are not aligned |
+| | | to bytes. Further described below. |
++-------------------+-----------------+----------------------------------------+
+| cycle_frequency | ``8`` | The frequency in hertz of the CPU |
+| | | oscillator used to measure duration of |
+| | | events in ticks. |
++-------------------+-----------------+----------------------------------------+
+| buffer_size | ``8`` | The size in bytes of the data portion |
+| | | of the trace following the header. |
++-------------------+-----------------+----------------------------------------+
+| reserved | ``8`` | Reserved for future use. |
++-------------------+-----------------+----------------------------------------+
+
+The bitfield parameter of the file header is composed of the following fields.
+
++-------------------+----------------+-----------------------------------------+
+| Field | Size (bits) | Description |
++===================+================+=========================================+
+| constant_tsc | ``1`` | Whether the platform's timestamp |
+| | | counter used to record ticks between |
+| | | events ticks at a constant frequency |
+| | | despite CPU frequency changes. |
+| | | 0 == non-constant. 1 == constant. |
++-------------------+----------------+-----------------------------------------+
+| nonstop_tsc | ``1`` | Whether the tsc continues to count |
+| | | despite whether the CPU is in a low |
+| | | power state. 0 == stop. 1 == non-stop. |
++-------------------+----------------+-----------------------------------------+
+| reserved | ``30`` | Not meaningful. |
++-------------------+----------------+-----------------------------------------+
+
+
+Data Section
+============
+
+Following the header in a trace is a data section with size matching the
+buffer_size field in the header.
+
+The data section is a stream of elements of different types.
+
+There are a few categories of data in the sequence.
+
+- ``Function Records``: Function Records contain the timing of entry into and
+ exit from function execution. Function Records have 8 bytes each.
+
+- ``Metadata Records``: Metadata records serve many purposes. Mostly, they
+ capture information that may be too costly to record for each function, but
+ that is required to contextualize the fine-grained timings. They also are used
+ as markers for user-defined Event Data payloads. Metadata records have 16
+ bytes each.
+
+- ``Event Data``: Free form data may be associated with events that are traced
+ by the binary and encode data defined by a handler function. Event data is
+ always preceded with a marker record which indicates how large it is.
+
+- ``Function Arguments``: The arguments to some functions are included in the
+ trace. These are either pointer addresses or primitives that are read and
+ logged independently of their types in a high level language. To the tracer,
+ they are all simply numbers. Function Records that have attached arguments
+ will indicate their presence on the function entry record. We only support
+ logging contiguous function argument sequences starting with argument zero,
+ which will be the "this" pointer for member function invocations. For example,
+ we don't support logging the first and third argument.
+
+A reader of the memory format must maintain a state machine. The format makes no
+attempt to pad for alignment, and it is not seekable.
+
+
+Function Records
+----------------
+
+Function Records have an 8 byte layout. This layout encodes information to
+reconstruct a call stack of instrumented function and their durations.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bits) | Description |
++===============+==============+===============================================+
+| discriminant | ``1`` | Indicates whether a reader should read a |
+| | | Function or Metadata record. Set to ``0`` for |
+| | | Function records. |
++---------------+--------------+-----------------------------------------------+
+| action | ``3`` | Specifies whether the function is being |
+| | | entered, exited, or is a non-standard entry |
+| | | or exit produced by optimizations. |
++---------------+--------------+-----------------------------------------------+
+| function_id | ``28`` | A numeric ID for the function. Resolved to a |
+| | | name via the xray instrumentation map. The |
+| | | instrumentation map is built by xray at |
+| | | compile time into an object file and pairs |
+| | | the function ids to addresses. It is used for |
+| | | patching and as a lookup into the binary's |
+| | | symbols to obtain names. |
++---------------+--------------+-----------------------------------------------+
+| tsc_delta | ``32`` | The number of ticks of the timestamp counter |
+| | | since a previous record recorded a delta or |
+| | | other TSC resetting event. |
++---------------+--------------+-----------------------------------------------+
+
+On little-endian machines, the bitfields are ordered from least significant bit
+bit to most significant bit. A reader can read an 8 bit value and apply the mask
+``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
+shift right by ``0x04`` to obtain the function_id field.
+
+On big-endian machine, the bitfields are written in order from most significant
+bit to least significant bit. A reader would read an 8 bit value and unsigned
+shift right by 7 bits for the discriminant. The function_id field could be
+obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
+
+Function action types are as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Type | Number | Description |
++===============+==============+===============================================+
+| Entry | ``0`` | Typical function entry. |
++---------------+--------------+-----------------------------------------------+
+| Exit | ``1`` | Typical function exit. |
++---------------+--------------+-----------------------------------------------+
+| Tail_Exit | ``2`` | An exit from a function due to Tail call |
+| | | optimization. |
++---------------+--------------+-----------------------------------------------+
+| Entry_Args | ``3`` | A function entry that records arguments. |
++---------------+--------------+-----------------------------------------------+
+
+Entry_Args records do not contain the arguments themselves. Instead, metadata
+records for each of the logged args follow the function record in the stream.
+
+
+Metadata Records
+----------------
+
+Interspersed throughout the buffer are 16 byte Metadata records. For typically
+instrumented binaries, they will be sparser than Function records, and they
+provide a fuller picture of the binary execution state.
+
+Metadata record layout is partially record dependent, but they share a common
+structure.
+
+The same bit field rules described for function records apply to the first byte
+of MetadataRecords. Within this byte, little endian machines use lsb to msb
+ordering and big endian machines use msb to lsb ordering.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size | Description |
++===============+==============+===============================================+
+| discriminant | ``1 bit`` | Indicates whether a reader should read a |
+| | | Function or Metadata record. Set to ``1`` for |
+| | | Metadata records. |
++---------------+--------------+-----------------------------------------------+
+| record_kind | ``7 bits`` | The type of Metadata record. |
++---------------+--------------+-----------------------------------------------+
+| data | ``15 bytes`` | A data field used differently for each record |
+| | | type. |
++---------------+--------------+-----------------------------------------------+
+
+Here is a table of the enumerated record kinds.
+
+====== ===========================
+Number Type
+------ ---------------------------
+0 NewBuffer
+1 EndOfBuffer
+2 NewCPUId
+3 TSCWrap
+4 WallTimeMarker
+5 CustomEventMarker
+6 CallArgument
+
+
+NewBuffer Records
+-----------------
+
+Each buffer begins with a NewBuffer record immediately after the header.
+It records the thread ID of the thread that the trace belongs to.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| thread_Id | ``2`` | Thread ID for buffer. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``13`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+WallClockTime Records
+---------------------
+
+Following the NewBuffer record, each buffer records an absolute time as a frame
+of reference for the durations recorded by timestamp counter deltas.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| seconds | ``8`` | Seconds on absolute timescale. The starting |
+| | | point is unspecified and depends on the |
+| | | implementation and platform configured by the |
+| | | tracer. |
++---------------+--------------+-----------------------------------------------+
+| microseconds | ``4`` | The microsecond component of the time. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``3`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+NewCpuId Records
+----------------
+
+Each function entry invokes a routine to determine what CPU is executing.
+Typically, this is done with readtscp, which reads the timestamp counter at the
+same time.
+
+If the tracing detects that the execution has switched CPUs or if this is the
+first instrumented entry point, the tracer will output a NewCpuId record.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| cpu_id | ``2`` | CPU Id. |
++---------------+--------------+-----------------------------------------------+
+| absolute_tsc | ``8`` | The absolute value of the timestamp counter. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``5`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+TSCWrap Records
+---------------
+
+Since each function record uses a 32 bit value to represent the number of ticks
+of the timestamp counter since the last reference, it is possible for this value
+to overflow, particularly for sparsely instrumented binaries.
+
+When this delta would not fit into a 32 bit representation, a reference absolute
+timestamp counter record is written in the form of a TSCWrap record.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| absolute_tsc | ``8`` | Timestamp counter value. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``7`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+CallArgument Records
+--------------------
+
+Immediately following an Entry_Args type function record, there may be one or
+more CallArgument records that contain the traced function's parameter values.
+
+The order of the CallArgument Record sequency corresponds one to one with the
+order of the function parameters.
+
+CallArgument data segment:
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| argument | ``8`` | Numeric argument (may be pointer address). |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``7`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+CustomEventMarker Records
+-------------------------
+
+XRay provides the feature of logging custom events. This may be leveraged to
+record tracing info for RPCs or similarly trace data that is application
+specific.
+
+Custom Events themselves are an unstructured (application defined) segment of
+memory with arbitrary size within the buffer. They are preceded by
+CustomEventMarkers to indicate their presence and size.
+
+CustomEventMarker data segment:
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| event_size | ``4`` | Size of preceded event. |
++---------------+--------------+-----------------------------------------------+
+| absolute_tsc | ``8`` | A timestamp counter of the event. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``3`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+EndOfBuffer Records
+-------------------
+
+An EndOfBuffer record type indicates that there is no more trace data in this
+buffer. The reader is expected to seek past the remaining buffer_size expressed
+before the start of buffer and look for either another header or EOF.
+
+
+Format Grammar and Invariants
+=============================
+
+Not all sequences of Metadata records and Function records are valid data. A
+sequence should be parsed as a state machine. The expectations for a valid
+format can be expressed as a context free grammar.
+
+This is an attempt to explain the format with statements in EBNF format.
+
+- Format := Header ThreadBuffer* EOF
+
+- ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
+
+- BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
+
+- Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
+
+- CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
+
+- End := EndOfBuffer RemainingBufferSizeToSkip
+
+
+Function Record Order
+---------------------
+
+There are a few clarifications that may help understand what is expected of
+Function records.
+
+- Functions with an Exit are expected to have a corresponding Entry or
+ Entry_Args function record precede them in the trace.
+
+- Tail_Exit Function records record the Function ID of the function whose return
+ address the program counter will take. In other words, the final function that
+ would be popped off of the call stack if tail call optimization was not used.
+
+- Not all functions marked for instrumentation are necessarily in the trace. The
+ tracer uses heuristics to preserve the trace for non-trivial functions.
+
+- Not every entry must have a traced Exit or Tail Exit. The buffer may run out
+ of space or the program may request for the tracer to finalize toreturn the
+ buffer before an instrumented function exits.
Modified: llvm/trunk/lib/XRay/Trace.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/XRay/Trace.cpp?rev=309836&r1=309835&r2=309836&view=diff
==============================================================================
--- llvm/trunk/lib/XRay/Trace.cpp (original)
+++ llvm/trunk/lib/XRay/Trace.cpp Wed Aug 2 10:27:20 2017
@@ -222,8 +222,8 @@ Error processCustomEventMarker(FDRState
DataExtractor &RecordExtractor,
size_t &RecordSize) {
// We can encounter a CustomEventMarker anywhere in the log, so we can handle
- // it regardless of the expectation. However, we do se the expectation to read
- // a set number of fixed bytes, as described in the metadata.
+ // it regardless of the expectation. However, we do set the expectation to
+ // read a set number of fixed bytes, as described in the metadata.
uint32_t OffsetPtr = 1; // Read after the first byte.
uint32_t DataSize = RecordExtractor.getU32(&OffsetPtr);
uint64_t TSC = RecordExtractor.getU64(&OffsetPtr);
@@ -333,7 +333,7 @@ Error processFDRFunctionRecord(FDRState
}
Record.CPU = State.CPUId;
Record.TId = State.ThreadId;
- // Back up to read first 32 bits, including the 8 we pulled RecordType
+ // Back up to read first 32 bits, including the 4 we pulled RecordType
// and RecordKind out of. The remaining 28 are FunctionId.
uint32_t OffsetPtr = 0;
// Despite function Id being a signed int on XRayRecord,
@@ -369,8 +369,9 @@ Error processFDRFunctionRecord(FDRState
/// We expect a format complying with the grammar in the following pseudo-EBNF.
///
/// FDRLog: XRayFileHeader ThreadBuffer*
-/// XRayFileHeader: 32 bits to identify the log as FDR with machine metadata.
-/// ThreadBuffer: BufSize NewBuffer WallClockTime NewCPUId FunctionSequence EOB
+/// XRayFileHeader: 32 bytes to identify the log as FDR with machine metadata.
+/// Includes BufferSize
+/// ThreadBuffer: NewBuffer WallClockTime NewCPUId FunctionSequence EOB
/// BufSize: 8 byte unsigned integer indicating how large the buffer is.
/// NewBuffer: 16 byte metadata record with Thread Id.
/// WallClockTime: 16 byte metadata record with human readable time.
More information about the llvm-commits
mailing list