[Mlir-commits] [mlir] [mlir][NFC] Document rationale, style for AMD dialects (PR #172703)

Wed Dec 17 09:54:24 PST 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

<details>
<summary>Changes</summary>

This commit adds documentation to the AMDGPU and ROCDL dialects describing their purpose and codifying design guidelines that these dialects follow.

---
Full diff: https://github.com/llvm/llvm-project/pull/172703.diff


2 Files Affected:

- (modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+84) 
- (modified) mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td (+88) 


``````````diff

diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index a0b8682965b20..ebeb203b81427 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -24,6 +24,90 @@ def AMDGPU_Dialect : Dialect {
     and LLVM intrinsics. These wrappers should be used in conjunction with
     more generic dialects, such as `gpu` and `vector`, when generating LLVM IR
     that will eventually be executed on AMD hardware.
+
+    # What goes here?
+    In many cases, AMD GPU functionality can be accessed either though generic
+    operations (such as those in the `gpu`, `vector`, or `math`) or through
+    the `rocdl` dialect's intrinsic wrappers. However, there are instances where
+    AMD-specific functionally benefits from a wrapper around the underlying
+    LLVM intrinsics.
+
+    In general terms, operations or types should be added to this dialect when they
+    wrap some AMD-specific functionality in a way that makes it work better with the
+    MLIR ecosystem and its types or when those buitins would be needlessly
+    complex to work with (such as if they features magic constants at the LLVM level).
+
+    An additional set of operations that belong in this dialect are those that
+    have chipset-specific differences that can be abstracted over in a useful way.
+
+    To give some concrete examples:
+ 
+    - `amdgpu.mfma` and `amdgpu.wmma` exist in order to make a large set of
+      intrinsics more compatible with the MLIR type system (such as by allowing
+      8-bit float vectors to be passed as `vector<N x f8E4M3FN>` or
+      `vector<N x f8E4M2>` instead of as packed 32-bit integers whose element type
+      is controlled by separate operator-level constants. These operations also
+      allow the same `amdgpu.mfma` operation to be used regardless of the target
+      chip.
+    - `amdgpu.swizzle_bitmode` provides a wrapper around the `ds.swizzle` intrinsic,
+      allowing a wider range of types (such as `vector<2xf16>`) to be used natively
+      and eliminating the need to pack the and, or, and xor components using opaque
+      shifts.
+    - Operations like `amdgpu.gather_to_lds` provide `memref`-ized wrappers around
+      intrinsics that take a pointer, and are nontrivial enough to justify inclusion
+      in this dialect.
+
+
+    Note that simple intrinsics like `rocdl.sin` or `rocdl.s.barrier` should not
+    receive wrapper operations, as nothing is gained from the duplicate operation.
+    As a rule of thumb, if an operation's rewrite in AMDGPUToROCDL would be only
+    a `replaceOpWithNewOp` call, no AMDGPU dialect operation is needed.
+
+    # Design guidelines
+
+    Operations should leverage MLIR's "standard" types where possible. MLIR has
+    a more extensible type system than LLVM (especially in the area of small floats)
+    and those types should be used to create more ergonomic wrappers. In particular,
+    intrinsics that take pointers should have wrappers in this dialect that take
+    `memref` arguments and indices.
+
+    Operations should use properties or attributes in cases where the underlying
+    intrinsic uses `immarg`s (except in cases where that attribute can be represented
+    in the type system).
+
+    If it is possible to generalize the types of an operation, it should be done.
+    For example, the underlying operations for permutations and swizzles always
+    take 32-bit operands. Their AMDGPU wrappers can take any type, and will apply
+    padding and expansion to multiple instructions as needed. This makes these
+    operations easier to target because it hides the bitcasts and extracts
+    until the final lowering.
+
+    When the underlying operation uses magic constants, those should be presented
+    in a more programmer-friendly fashion, such as through enums or though
+    using separate arguments that are later combined. (For example, see the
+    design of the `amdgpu.dpp` and `amdgpu.fat_raw_buffer_cast` operations.)
+
+    If sufficiently similar functionality on multiple hardware generations can be
+    encapsulated into a single operation, it should be done. The lowering to
+    intrinsics should either throw an error when an unsupported capability is
+    used or ignore it. Which of these is two failure modes is more appropriate
+    depends on the nature of the feature, but errors are a safe default choice.
+
+    # Documentation guidelines
+
+    AMDGPU dialect operations should document how any abstractions they introduce
+    translate to LLVM intrinsics or hardware operations.
+
+    While documenting the semantics of the underlying operations is not required,
+    is preferred to provide an overview of the operation's functionality,
+    especially in cases where the documentation is widely distributed. Someone
+    looking at an AMDGPU dialect operation should be able to generally understand
+    what it does and have found the keywords they'll need for more detail.
+
+    Operation documentation should include usage examples.
+    
+    Note that this dialect uses LLVM's gfx numbers to refer to individual
+    architectures/chipsets and not product names or codenames.
   }];
 
 
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 99cc6da0ec304..4e818dfb996f7 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -25,8 +25,89 @@ def ROCDL_Dialect : Dialect {
   let name = "rocdl";
   let cppNamespace = "::mlir::ROCDL";
   let dependentDialects = ["LLVM::LLVMDialect"];
+  let summary = "Dialect for wrapping LLVM AMDGPU backend intrinsics and attributes";
   let hasOperationAttrVerify = 1;
 
+  let description = [{
+    The ROCDL dialect, like the other platform-specific LLVM dialects, serves
+    as the location of wrappers around the AMD-specific intrinsics and attributes
+    in LLVM. (Why the dialect is named ROCDL and not, say, AMDGPU, is unknown.)
+
+    This dialect, like other GPU lowering targets, also contains the infrastructure
+    used by the built-in compilation/offloading framework to compile AMD-specific
+    LLVM IR into binaries.
+
+    # Dialect inclusion criteria and guidelines
+
+    The operations in this dialect are 1:1 wrappers around their corresponding
+    LLVM intrinsics. Operations that do not correspond to intrinsics should not
+    be placed in this dialect.
+
+    The definition of a ROCDL op should match its LLVM counterpart. If the
+    argument and result types are fixed, they should be specified as type
+    constraints, including by overriding the default variadic type on LLVM
+    intrinsics by doing a `let results` in the operation definition.
+
+    LLVM attributes do not need to be replicated exactly if it wouldn't be
+    easy to do so, but pure operations and ones that read/write memory should
+    be annotated as such.
+
+    While LLVM intrinsics currently don't allow constraining the values an
+    `any_type` can take, it is acceptable (but not required) to impose such
+    constraints if they are known.
+
+    When an LLVM intrinsic uses an `immarg`, this corresponds to an attribute
+    in MLIR.
+
+    Human-readable assembly formats (those that, for example, explicitly indicate
+    parameter names) may be used, and are encouraged for intrinsics that have
+    complex argument schemes and don't have any higher-level wrapper (such as
+    in the `amdgpu` dialect).
+
+    While not all existing operations follow this convention, new operations should
+    generally provide argument and result types except in cases where they are
+    clearly redundant (such as with operations like `rocdl.fmed3`, which doesn't
+    need to reiterate the single type at issue multiple times). This convention
+    enhances the readability of low-level IR and prevents programmers from needing
+    to find non-local type information.
+
+    Dialect-defined discardable attributes (any attribute starting with `rocdl.`
+    that has special handling) need to correspond to AMD-specific attributes, metadata,
+    or other entities (such as calling conventions) in LLVM, or be needed for
+    GPU compilation management. Outside of the compilation infrastructure,
+    dialect-specific enums or attributes are extmelely unlikely to be needed
+    and should be avoided.
+
+    Operation documentation should specify when the operation was introduced
+    (if relevant) and include usage examples. Operations should have
+    parser/printer tests in `mlir/test/Dialect/LLVMIR/rocdl.mlir` and
+    lowering tests in `mlir/test/Target/LLVMIR/rocdl.mlir`.
+    
+    # General documentation (What does this op do?)
+
+    While rocdl ops sometimes carry their own documentation, there is no
+    expectation that such documentation will exist (or be kept up to date).
+
+    Since ROCDL operations correspond to LLVM intrinsics, the semantics and
+    behavior of these operations can be determined by investigating the
+    documentation for the corresponding intrinsic. This documentation
+    can be found in
+    - `llvm/docs/AMDGPUUsage.rst` and
+    - The comments of `llvm/include/llvm/IR/IntrinsicsAMDGPU.td`, which
+      is where details of the meaning of certain bitfields or of how an
+      intrinsic corresponds to hardware instructions are most likely to
+      be found.
+
+    Since many intrinsics are themselves minimal wrappers around hardware
+    instructions, these documentation sources often do not repeat hardware
+    documentation. If an intrinsic appears undocumented, information about
+    its behavior will often be available in published ISA descriptions or
+    (sometimes known as shader programming guides).
+
+    If an operation doesn't provide usage examples, it is likely that they
+    can be found in `mlir/test/Dialect/LLVMIR/rocdl.td`.
+  }];
+
   let extraClassDeclaration = [{
     /// Get the name of the attribute used to annotate external kernel
     /// functions.
@@ -130,6 +211,7 @@ class ROCDL_SpecialIdRegisterOp<string mnemonic> :
   ];
 }
 
+// TODO(krzysz00): This should be a lowering pattern, not an op.
 class ROCDL_DimGetterFunctionOp<string mnemonic, string device_function,
                              int parameter, list<Trait> traits = []> :
   ROCDL_Op<mnemonic, !listconcat(traits, [Pure])>,
@@ -322,6 +404,12 @@ def ROCDL_BarrierOp : ROCDL_Op<"barrier"> {
     builder.CreateFence(llvm::AtomicOrdering::Acquire,
                         llvmContext.getOrInsertSyncScopeID("workgroup"));
   }];
+  let description = [{
+    An operation with the same expansion as HIP's __synchthreads();
+
+    **DEPRECATION NOTICE**: Use `gpu.barrier`, which will expand to these
+    operations, instead.
+  }];
   let assemblyFormat = "attr-dict";
 }
 

``````````

</details>


https://github.com/llvm/llvm-project/pull/172703