[Openmp-commits] [openmp] [offload][OpenMP] Add basic documentation for kernel record replay (PR #193699)
Kevin Sala Penades via Openmp-commits
openmp-commits at lists.llvm.org
Thu Apr 23 02:01:00 PDT 2026
https://github.com/kevinsala created https://github.com/llvm/llvm-project/pull/193699
None
>From db5cad0223613979e7c8f3abfe61c76f89198471 Mon Sep 17 00:00:00 2001
From: Kevin Sala <salapenades1 at llnl.gov>
Date: Wed, 22 Apr 2026 18:50:38 -0700
Subject: [PATCH 1/2] [offload] Fix envar description in docs
---
openmp/docs/design/Runtimes.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index ab9484f9ad0a2..4578ee228edab 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -1188,10 +1188,10 @@ This environment variable determines if the stack traces of allocations and
deallocations are tracked to aid in error reporting, e.g., in case of
double-free.
-OFFLOAD_TRACK_KERNEL_LAUNCH_TRACES
-""""""""""""""""""""""""""""""""""
+OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES
+""""""""""""""""""""""""""""""""""""""
-This environment variable determines how manytstack traces of kernel launches
+This environment variable determines how many stack traces of kernel launches
are tracked to aid in error reporting, e.g., what asynchronous kernel failed.
.. _libomptarget_plugin:
>From 608b7a0069b3f712e9d8fafaa66ab3465c5a1d41 Mon Sep 17 00:00:00 2001
From: Kevin Sala <salapenades1 at llnl.gov>
Date: Thu, 23 Apr 2026 00:32:33 -0700
Subject: [PATCH 2/2] [offload][OpenMP] Add basic documentation for kernel
record replay
---
openmp/docs/design/Runtimes.rst | 134 ++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index 4578ee228edab..7133d549071cf 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -1194,6 +1194,140 @@ OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES
This environment variable determines how many stack traces of kernel launches
are tracked to aid in error reporting, e.g., what asynchronous kernel failed.
+.. _libomptarget_kernel_record_replay:
+
+Kernel Record Replay
+^^^^^^^^^^^^^^^^^^^^
+
+The Kernel Record and Replay mechanism enables recording the execution of GPU
+kernels on OpenMP applications and replaying them in isolation using
+``llvm-omp-kernel-replay``, a lightweight LLVM-based tool. This tool is useful
+for extracting kernel executions from applications to analyze them
+independently, with the flexibility to modify certain runtime parameters.
+
+The mechanism consists of two phases: recording and replaying. During the
+recording phase, an OpenMP target program automatically dumps the kernel input
+and output device memory snapshots to files for each recorded kernel. It also
+generates a JSON file that describes the kernel alongside the runtime parameters
+(e.g., the number of teams and threads).
+
+To record the kernels of an OpenMP application, enable the
+:ref:`LIBOMPTARGET_RECORD` environment variable when running the program. An
+example is shown below:
+
+.. code-block:: console
+
+ $ LIBOMPTARGET_RECORD=1 LIBOMPTARGET_RECORD_REPORT=1 LIBOMPTARGET_RECORD_DIR=records ./application
+ ... application output ...
+ === Kernel Record Report ===
+ Directory: /home/records
+ Total Instances: 1
+ JSON Filename, Kernel Name, Time (ns), Occurrences:
+ 5681756204876336171_6652394454608725381.json, __omp_offloading_48_5f678667_run_event_based_simulation_l44, 63437836, 1
+ === End Kernel Record Report ===
+
+The command above creates a directory (as indicated by
+:ref:`LIBOMPTARGET_RECORD_DIR`) containing the memory snapshots and a JSON file
+for each recorded kernel. This JSON file contains the description, properties,
+and original runtime parameters of the kernel. Additionally, enabling
+:ref:`LIBOMPTARGET_RECORD_REPORT` instructs the runtime to emit a summary of the
+recorded kernel instances and their associated JSON files.
+
+To replay a particular kernel, run the ``llvm-omp-kernel-replay`` command,
+passing the path to the corresponding kernel's JSON file:
+
+.. code-block:: console
+
+ $ llvm-omp-kernel-replay --repetitions=5 records/5681756204876336171_6652394454608725381.json
+ [llvm-omp-kernel-replay] Replay time (1): 94926702 ns
+ [llvm-omp-kernel-replay] Replay time (2): 94642823 ns
+ [llvm-omp-kernel-replay] Replay time (3): 94429614 ns
+ [llvm-omp-kernel-replay] Replay time (4): 94574421 ns
+ [llvm-omp-kernel-replay] Replay time (5): 94359425 ns
+ [llvm-omp-kernel-replay] Replay done, verification skipped
+
+When replaying, you can tune the execution using the following flags, among
+others:
+
+* ``--repetitions=N``: Sets the number of repetitions of the kernel replay
+ (default 1).
+* ``--num-threads=N``: Overrides the number of threads per team.
+* ``--num-teams=N``: Overrides the number of teams.
+
+If ``--num-threads`` or ``--num-teams`` are not specified, the replay
+automatically defaults to the values used during the original recorded run. The
+replay tool will issue an error if you specify a number of threads or teams that
+is incompatible with the limits established by the original code (e.g.,
+exceeding bounds set by a ``num_teams`` or ``thread_limit`` clause).
+
+The time reported by the replay tool corresponds to the host-side kernel launch
+and synchronization time. If highly precise kernel timing is required, it is
+recommended to use dedicated profiling tools in conjunction with the replay
+tool.
+
+Finally, the replay tool provides an optional verification step that checks
+whether the output device memory snapshot generated during replay matches the
+output snapshot captured during the recording phase. Because this verification
+performs a strict binary difference between the two memory snapshots, the check
+may fail for kernels operating on floating-point data due to normal variations
+in precision and operation order.
+
+The recording phase, implemented by ``libomptarget``, can be controlled via
+environment variables. A full list of environment variables and their definition
+is provided below.
+
+* ``LIBOMPTARGET_RECORD=[TRUE/FALSE] (default FALSE)``
+* ``LIBOMPTARGET_RECORD_DIR=<Filepath>``
+* ``LIBOMPTARGET_RECORD_REPORT=[TRUE/FALSE] (default FALSE)``
+* ``LIBOMPTARGET_RECORD_MEMSIZE=<Num> (default 8*1024*1024*1024)``
+* ``LIBOMPTARGET_RECORD_DEVICE=<Num> (default 0)``
+* ``LIBOMPTARGET_RECORD_OUTPUT=[TRUE/FALSE] (default TRUE)``
+
+LIBOMPTARGET_RECORD
+"""""""""""""""""""
+
+This environment variable is used to enable the kernel recording mechanism in
+the execution of a OpenMP program. Enabling the record may introduce significant
+overhead to the recorded program. When the recording is disabled, the following
+recording environment variables are not considered. The recording is disabled by
+default.
+
+LIBOMPTARGET_RECORD_DIR
+"""""""""""""""""""""""
+
+This environment variable is used to specify the relative or absolute path to
+the directory where the recorded files will be stored. If omitted or empty, the
+files will be stored in current working directory.
+
+LIBOMPTARGET_RECORD_REPORT
+""""""""""""""""""""""""""
+
+This environment variable is used to instruct the runtime to emit a summary of
+the recorded kernel instances and their associated JSON files. By default, no
+report is emitted.
+
+LIBOMPTARGET_RECORD_MEMSIZE
+"""""""""""""""""""""""""""
+
+This environment variable is used to indicate the maximum size of device virtual
+memory that will be captured in the snapshots during the recording phase. This
+value only indicates the maximum size; the snapshot files will just contain the
+actually used data. Modifying this environment variable should be needed in very
+specific cases. By default, the size is ``8*1024*1024*1024`` bytes (8 GB).
+
+LIBOMPTARGET_RECORD_DEVICE
+""""""""""""""""""""""""""
+
+This environment variable is used to indicate the number of the device whose
+kernels should be recorded. Only the kernels executed by this device will be
+recorded. The default device is ``0``.
+
+LIBOMPTARGET_RECORD_OUTPUT
+""""""""""""""""""""""""""
+
+This environment variable is used to instruct the runtime to record the output
+device memory snapshot into a file. The default value is ``TRUE``.
+
.. _libomptarget_plugin:
LLVM/OpenMP Target Host Runtime Plugins (``libomptarget.rtl.XXXX``)
More information about the Openmp-commits
mailing list