[Openmp-commits] [openmp] [offload][OpenMP] Add basic documentation for kernel record replay (PR #193699)

Kevin Sala Penades via Openmp-commits openmp-commits at lists.llvm.org
Thu Apr 23 02:11:02 PDT 2026


https://github.com/kevinsala updated https://github.com/llvm/llvm-project/pull/193699

>From 513614706a9337736d77e60a9b9536459916918d Mon Sep 17 00:00:00 2001
From: Kevin Sala <salapenades1 at llnl.gov>
Date: Thu, 23 Apr 2026 00:32:33 -0700
Subject: [PATCH 1/2] [offload][OpenMP] Add basic documentation for kernel
 record replay

---
 openmp/docs/design/Runtimes.rst | 134 ++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)

diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index ab9484f9ad0a2..0af041f6fc60e 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -1194,6 +1194,140 @@ OFFLOAD_TRACK_KERNEL_LAUNCH_TRACES
 This environment variable determines how manytstack traces of kernel launches
 are tracked to aid in error reporting, e.g., what asynchronous kernel failed.
 
+.. _libomptarget_kernel_record_replay:
+
+Kernel Record Replay
+^^^^^^^^^^^^^^^^^^^^
+
+The Kernel Record and Replay mechanism enables recording the execution of GPU
+kernels on OpenMP applications and replaying them in isolation using
+``llvm-omp-kernel-replay``, a lightweight LLVM-based tool. This tool is useful
+for extracting kernel executions from applications to analyze them
+independently, with the flexibility to modify certain runtime parameters.
+
+The mechanism consists of two phases: recording and replaying. During the
+recording phase, an OpenMP target program automatically dumps the kernel input
+and output device memory snapshots to files for each recorded kernel. It also
+generates a JSON file that describes the kernel alongside the runtime parameters
+(e.g., the number of teams and threads).
+
+To record the kernels of an OpenMP application, enable the
+:ref:`LIBOMPTARGET_RECORD` environment variable when running the program. An
+example is shown below:
+
+.. code-block:: console
+
+    $ LIBOMPTARGET_RECORD=1 LIBOMPTARGET_RECORD_REPORT=1 LIBOMPTARGET_RECORD_DIR=records ./application
+    ... application output ...
+    === Kernel Record Report ===
+    Directory: /home/records
+    Total Instances: 1
+    JSON Filename, Kernel Name, Time (ns), Occurrences:
+    5681756204876336171_6652394454608725381.json, __omp_offloading_48_5f678667_run_event_based_simulation_l44, 63437836, 1
+    === End Kernel Record Report ===
+
+The command above creates a directory (as indicated by
+:ref:`LIBOMPTARGET_RECORD_DIR`) containing the memory snapshots and a JSON file
+for each recorded kernel. This JSON file contains the description, properties,
+and original runtime parameters of the kernel. Additionally, enabling
+:ref:`LIBOMPTARGET_RECORD_REPORT` instructs the runtime to emit a summary of the
+recorded kernel instances and their associated JSON files.
+
+To replay a particular kernel, run the ``llvm-omp-kernel-replay`` command,
+passing the path to the corresponding kernel's JSON file:
+
+.. code-block:: console
+
+    $ llvm-omp-kernel-replay --repetitions=5 records/5681756204876336171_6652394454608725381.json
+    [llvm-omp-kernel-replay] Replay time (1): 94926702 ns
+    [llvm-omp-kernel-replay] Replay time (2): 94642823 ns
+    [llvm-omp-kernel-replay] Replay time (3): 94429614 ns
+    [llvm-omp-kernel-replay] Replay time (4): 94574421 ns
+    [llvm-omp-kernel-replay] Replay time (5): 94359425 ns
+    [llvm-omp-kernel-replay] Replay done, verification skipped
+
+When replaying, you can tune the execution using the following flags, among
+others:
+
+* ``--repetitions=N``: Sets the number of repetitions of the kernel replay
+  (default 1).
+* ``--num-threads=N``: Overrides the number of threads per team.
+* ``--num-teams=N``: Overrides the number of teams.
+
+If ``--num-threads`` or ``--num-teams`` are not specified, the replay
+automatically defaults to the values used during the original recorded run. The
+replay tool will issue an error if you specify a number of threads or teams that
+is incompatible with the limits established by the original code (e.g.,
+exceeding bounds set by a ``num_teams`` or ``thread_limit`` clause).
+
+The time reported by the replay tool corresponds to the host-side kernel launch
+and synchronization time. If highly precise kernel timing is required, it is
+recommended to use dedicated profiling tools in conjunction with the replay
+tool.
+
+Finally, the replay tool provides an optional verification step that checks
+whether the output device memory snapshot generated during replay matches the
+output snapshot captured during the recording phase. Because this verification
+performs a strict binary difference between the two memory snapshots, the check
+may fail for kernels operating on floating-point data due to normal variations
+in precision and operation order.
+
+The recording phase, implemented by ``libomptarget``, can be controlled via
+environment variables. A full list of environment variables and their definition
+is provided below.
+
+* ``LIBOMPTARGET_RECORD=[TRUE/FALSE] (default FALSE)``
+* ``LIBOMPTARGET_RECORD_DIR=<Filepath>``
+* ``LIBOMPTARGET_RECORD_REPORT=[TRUE/FALSE] (default FALSE)``
+* ``LIBOMPTARGET_RECORD_MEMSIZE=<Num> (default 8*1024*1024*1024)``
+* ``LIBOMPTARGET_RECORD_DEVICE=<Num> (default 0)``
+* ``LIBOMPTARGET_RECORD_OUTPUT=[TRUE/FALSE] (default TRUE)``
+
+LIBOMPTARGET_RECORD
+"""""""""""""""""""
+
+This environment variable is used to enable the kernel recording mechanism in
+the execution of a OpenMP program. Enabling the record may introduce significant
+overhead to the recorded program. When the recording is disabled, the following
+recording environment variables are not considered. The recording is disabled by
+default.
+
+LIBOMPTARGET_RECORD_DIR
+"""""""""""""""""""""""
+
+This environment variable is used to specify the relative or absolute path to
+the directory where the recorded files will be stored. If omitted or empty, the
+files will be stored in current working directory.
+
+LIBOMPTARGET_RECORD_REPORT
+""""""""""""""""""""""""""
+
+This environment variable is used to instruct the runtime to emit a summary of
+the recorded kernel instances and their associated JSON files. By default, no
+report is emitted.
+
+LIBOMPTARGET_RECORD_MEMSIZE
+"""""""""""""""""""""""""""
+
+This environment variable is used to indicate the maximum size of device virtual
+memory that will be captured in the snapshots during the recording phase. This
+value only indicates the maximum size; the snapshot files will just contain the
+actually used data. Modifying this environment variable should be needed in very
+specific cases. By default, the size is ``8*1024*1024*1024`` bytes (8 GB).
+
+LIBOMPTARGET_RECORD_DEVICE
+""""""""""""""""""""""""""
+
+This environment variable is used to indicate the number of the device whose
+kernels should be recorded. Only the kernels executed by this device will be
+recorded. The default device is ``0``.
+
+LIBOMPTARGET_RECORD_OUTPUT
+""""""""""""""""""""""""""
+
+This environment variable is used to instruct the runtime to record the output
+device memory snapshot into a file. The default value is ``TRUE``.
+
 .. _libomptarget_plugin:
 
 LLVM/OpenMP Target Host Runtime Plugins (``libomptarget.rtl.XXXX``)

>From 3adb63b9acf9730f9bcee51104cc12bfeaa71251 Mon Sep 17 00:00:00 2001
From: Kevin Sala <salapenades1 at llnl.gov>
Date: Thu, 23 Apr 2026 02:09:26 -0700
Subject: [PATCH 2/2] Add fixes

---
 openmp/docs/design/Runtimes.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index 0af041f6fc60e..b234e8e85b4f1 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -1283,6 +1283,8 @@ is provided below.
 * ``LIBOMPTARGET_RECORD_DEVICE=<Num> (default 0)``
 * ``LIBOMPTARGET_RECORD_OUTPUT=[TRUE/FALSE] (default TRUE)``
 
+.. _libomptarget_record:
+
 LIBOMPTARGET_RECORD
 """""""""""""""""""
 
@@ -1292,6 +1294,8 @@ overhead to the recorded program. When the recording is disabled, the following
 recording environment variables are not considered. The recording is disabled by
 default.
 
+.. _libomptarget_record_dir:
+
 LIBOMPTARGET_RECORD_DIR
 """""""""""""""""""""""
 
@@ -1299,6 +1303,8 @@ This environment variable is used to specify the relative or absolute path to
 the directory where the recorded files will be stored. If omitted or empty, the
 files will be stored in current working directory.
 
+.. _libomptarget_record_report:
+
 LIBOMPTARGET_RECORD_REPORT
 """"""""""""""""""""""""""
 



More information about the Openmp-commits mailing list