[Mlir-commits] [mlir] 6ecbed8 - [MLIR][NVVM] Add definition for nvvm dialect (#156807)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Fri Sep 5 09:23:04 PDT 2025
Author: Guray Ozen
Date: 2025-09-05T18:23:00+02:00
New Revision: 6ecbed8b3d6507f5a4cd64eb23439a19965102cf
URL: https://github.com/llvm/llvm-project/commit/6ecbed8b3d6507f5a4cd64eb23439a19965102cf
DIFF: https://github.com/llvm/llvm-project/commit/6ecbed8b3d6507f5a4cd64eb23439a19965102cf.diff
LOG: [MLIR][NVVM] Add definition for nvvm dialect (#156807)
Added:
Modified:
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
Removed:
################################################################################
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 9d93b4efe7a5b..0a07578c337c1 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -35,6 +35,50 @@ def LLVM_PointerSharedCluster : LLVM_PointerInAddressSpace<7>;
//===----------------------------------------------------------------------===//
def NVVM_Dialect : Dialect {
+ let summary = "The NVVM dialect that models NVIDIA's public ISA";
+
+ let description = [{
+ The NVVM dialect is MLIR's LLVM-IR-based, NVIDIA-specific backend dialect. It
+ models NVVM intrinsics and public ISA functionality and introduces NVIDIA
+ extensions to the MLIR/LLVM type system and address spaces (e.g., global,
+ shared, and cluster memory), enabling faithful lowering of GPU kernels to the
+ NVPTX toolchain. While a NVVM op usually maps to a single LLVM IR intrinsic,
+ the NVVM dialect uses type polymorphism and other attributes so that a single
+ NVVM op can map to
diff erent LLVM intrinsics.
+
+ **Scope and capabilities:** The dialect covers core GPU features such as
+ thread/block builtins, barriers and atomics, warp-level collectives (e.g.,
+ shuffle/vote), matrix/tensor core operations (e.g., `mma.sync`, `wgmma`),
+ tensor memory accelerator (TMA) operations, asynchronous copies (`cp.async`,
+ bulk/tensor variants) with memory barriers, cache and prefetch controls, and
+ NVVM-specific attributes and enums (e.g., FP rounding modes, memory scopes,
+ and MMA types/layouts).
+
+ **Non-goals:** NVVM is not a place for convenience or “wrapper” ops. It is
+ not intended to introduce high-level ops that expand into multiple unrelated
+ NVVM intrinsics or that lower to no intrinsic at all. Such abstractions belong
+ in higher-level dialects (e.g., `nvgpu`, `gpu`, or project-specific dialects).
+ The design intent is a thin, predictable, low-level surface with
+ near-mechanical lowering to NVVM/LLVM IR.
+
+ **Placement in the lowering pipeline:** NVVM sits below target-agnostic
+ dialects like `gpu` and NVIDIA's `nvgpu`. Typical pipelines convert
+ `gpu`/`nvgpu` ops into NVVM using `-convert-gpu-to-nvvm` and
+ `-convert-nvgpu-to-nvvm`, then translate into LLVM for final code
+ generation via NVPTX backend.
+
+ **Target configuration and serialization:** NVVM provides a `#nvvm.target`
+ attribute to describe the GPU target (SM, features, and flags). In
+ conjunction with `gpu` serialization (e.g., `gpu-module-to-binary`), this
+ enables producing architecture-specific GPU binaries (such as CUBIN) from
+ nested GPU modules.
+
+ **Inline PTX:** When an intrinsic is unavailable or a performance-critical
+ sequence must be expressed directly, NVVM provides an `nvvm.inline_ptx` op to
+ embed PTX inline as a last-resort escape hatch, with explicit operands and
+ results.
+ }];
+
let name = "nvvm";
let cppNamespace = "::mlir::NVVM";
let dependentDialects = ["LLVM::LLVMDialect"];
@@ -976,7 +1020,7 @@ def NVVM_ShflOp :
let description = [{
The `shfl.sync` Op implements data shuffle within threads of a warp.
The `thread_mask` denotes the threads participating in the Op where
- the bit position corresponds to a particular thread’s laneid.
+ the bit position corresponds to a particular thread's laneid.
The `offset` specifies a source lane or source lane offset
(depending on `kind`). The `val` is the input value to be copied from
the source. The `mask_and_clamp` contains two packed values specifying
@@ -1031,7 +1075,7 @@ def NVVM_VoteSyncOp
- `ballot`: In the ballot form, the destination result is a 32 bit integer.
In this form, the predicate from each thread in membermask are copied into
the corresponding bit position of the result, where the bit position
- corresponds to the thread’s lane id.
+ corresponds to the thread's lane id.
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-vote-sync)
}];
More information about the Mlir-commits
mailing list