[Mlir-commits] [mlir] [mlir][NFC] Document rationale, style for AMD dialects (PR #172703)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Dec 17 09:54:24 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Krzysztof Drewniak (krzysz00)
<details>
<summary>Changes</summary>
This commit adds documentation to the AMDGPU and ROCDL dialects describing their purpose and codifying design guidelines that these dialects follow.
---
Full diff: https://github.com/llvm/llvm-project/pull/172703.diff
2 Files Affected:
- (modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+84)
- (modified) mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td (+88)
``````````diff
diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index a0b8682965b20..ebeb203b81427 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -24,6 +24,90 @@ def AMDGPU_Dialect : Dialect {
and LLVM intrinsics. These wrappers should be used in conjunction with
more generic dialects, such as `gpu` and `vector`, when generating LLVM IR
that will eventually be executed on AMD hardware.
+
+ # What goes here?
+ In many cases, AMD GPU functionality can be accessed either though generic
+ operations (such as those in the `gpu`, `vector`, or `math`) or through
+ the `rocdl` dialect's intrinsic wrappers. However, there are instances where
+ AMD-specific functionally benefits from a wrapper around the underlying
+ LLVM intrinsics.
+
+ In general terms, operations or types should be added to this dialect when they
+ wrap some AMD-specific functionality in a way that makes it work better with the
+ MLIR ecosystem and its types or when those buitins would be needlessly
+ complex to work with (such as if they features magic constants at the LLVM level).
+
+ An additional set of operations that belong in this dialect are those that
+ have chipset-specific differences that can be abstracted over in a useful way.
+
+ To give some concrete examples:
+
+ - `amdgpu.mfma` and `amdgpu.wmma` exist in order to make a large set of
+ intrinsics more compatible with the MLIR type system (such as by allowing
+ 8-bit float vectors to be passed as `vector<N x f8E4M3FN>` or
+ `vector<N x f8E4M2>` instead of as packed 32-bit integers whose element type
+ is controlled by separate operator-level constants. These operations also
+ allow the same `amdgpu.mfma` operation to be used regardless of the target
+ chip.
+ - `amdgpu.swizzle_bitmode` provides a wrapper around the `ds.swizzle` intrinsic,
+ allowing a wider range of types (such as `vector<2xf16>`) to be used natively
+ and eliminating the need to pack the and, or, and xor components using opaque
+ shifts.
+ - Operations like `amdgpu.gather_to_lds` provide `memref`-ized wrappers around
+ intrinsics that take a pointer, and are nontrivial enough to justify inclusion
+ in this dialect.
+
+
+ Note that simple intrinsics like `rocdl.sin` or `rocdl.s.barrier` should not
+ receive wrapper operations, as nothing is gained from the duplicate operation.
+ As a rule of thumb, if an operation's rewrite in AMDGPUToROCDL would be only
+ a `replaceOpWithNewOp` call, no AMDGPU dialect operation is needed.
+
+ # Design guidelines
+
+ Operations should leverage MLIR's "standard" types where possible. MLIR has
+ a more extensible type system than LLVM (especially in the area of small floats)
+ and those types should be used to create more ergonomic wrappers. In particular,
+ intrinsics that take pointers should have wrappers in this dialect that take
+ `memref` arguments and indices.
+
+ Operations should use properties or attributes in cases where the underlying
+ intrinsic uses `immarg`s (except in cases where that attribute can be represented
+ in the type system).
+
+ If it is possible to generalize the types of an operation, it should be done.
+ For example, the underlying operations for permutations and swizzles always
+ take 32-bit operands. Their AMDGPU wrappers can take any type, and will apply
+ padding and expansion to multiple instructions as needed. This makes these
+ operations easier to target because it hides the bitcasts and extracts
+ until the final lowering.
+
+ When the underlying operation uses magic constants, those should be presented
+ in a more programmer-friendly fashion, such as through enums or though
+ using separate arguments that are later combined. (For example, see the
+ design of the `amdgpu.dpp` and `amdgpu.fat_raw_buffer_cast` operations.)
+
+ If sufficiently similar functionality on multiple hardware generations can be
+ encapsulated into a single operation, it should be done. The lowering to
+ intrinsics should either throw an error when an unsupported capability is
+ used or ignore it. Which of these is two failure modes is more appropriate
+ depends on the nature of the feature, but errors are a safe default choice.
+
+ # Documentation guidelines
+
+ AMDGPU dialect operations should document how any abstractions they introduce
+ translate to LLVM intrinsics or hardware operations.
+
+ While documenting the semantics of the underlying operations is not required,
+ is preferred to provide an overview of the operation's functionality,
+ especially in cases where the documentation is widely distributed. Someone
+ looking at an AMDGPU dialect operation should be able to generally understand
+ what it does and have found the keywords they'll need for more detail.
+
+ Operation documentation should include usage examples.
+
+ Note that this dialect uses LLVM's gfx numbers to refer to individual
+ architectures/chipsets and not product names or codenames.
}];
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 99cc6da0ec304..4e818dfb996f7 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -25,8 +25,89 @@ def ROCDL_Dialect : Dialect {
let name = "rocdl";
let cppNamespace = "::mlir::ROCDL";
let dependentDialects = ["LLVM::LLVMDialect"];
+ let summary = "Dialect for wrapping LLVM AMDGPU backend intrinsics and attributes";
let hasOperationAttrVerify = 1;
+ let description = [{
+ The ROCDL dialect, like the other platform-specific LLVM dialects, serves
+ as the location of wrappers around the AMD-specific intrinsics and attributes
+ in LLVM. (Why the dialect is named ROCDL and not, say, AMDGPU, is unknown.)
+
+ This dialect, like other GPU lowering targets, also contains the infrastructure
+ used by the built-in compilation/offloading framework to compile AMD-specific
+ LLVM IR into binaries.
+
+ # Dialect inclusion criteria and guidelines
+
+ The operations in this dialect are 1:1 wrappers around their corresponding
+ LLVM intrinsics. Operations that do not correspond to intrinsics should not
+ be placed in this dialect.
+
+ The definition of a ROCDL op should match its LLVM counterpart. If the
+ argument and result types are fixed, they should be specified as type
+ constraints, including by overriding the default variadic type on LLVM
+ intrinsics by doing a `let results` in the operation definition.
+
+ LLVM attributes do not need to be replicated exactly if it wouldn't be
+ easy to do so, but pure operations and ones that read/write memory should
+ be annotated as such.
+
+ While LLVM intrinsics currently don't allow constraining the values an
+ `any_type` can take, it is acceptable (but not required) to impose such
+ constraints if they are known.
+
+ When an LLVM intrinsic uses an `immarg`, this corresponds to an attribute
+ in MLIR.
+
+ Human-readable assembly formats (those that, for example, explicitly indicate
+ parameter names) may be used, and are encouraged for intrinsics that have
+ complex argument schemes and don't have any higher-level wrapper (such as
+ in the `amdgpu` dialect).
+
+ While not all existing operations follow this convention, new operations should
+ generally provide argument and result types except in cases where they are
+ clearly redundant (such as with operations like `rocdl.fmed3`, which doesn't
+ need to reiterate the single type at issue multiple times). This convention
+ enhances the readability of low-level IR and prevents programmers from needing
+ to find non-local type information.
+
+ Dialect-defined discardable attributes (any attribute starting with `rocdl.`
+ that has special handling) need to correspond to AMD-specific attributes, metadata,
+ or other entities (such as calling conventions) in LLVM, or be needed for
+ GPU compilation management. Outside of the compilation infrastructure,
+ dialect-specific enums or attributes are extmelely unlikely to be needed
+ and should be avoided.
+
+ Operation documentation should specify when the operation was introduced
+ (if relevant) and include usage examples. Operations should have
+ parser/printer tests in `mlir/test/Dialect/LLVMIR/rocdl.mlir` and
+ lowering tests in `mlir/test/Target/LLVMIR/rocdl.mlir`.
+
+ # General documentation (What does this op do?)
+
+ While rocdl ops sometimes carry their own documentation, there is no
+ expectation that such documentation will exist (or be kept up to date).
+
+ Since ROCDL operations correspond to LLVM intrinsics, the semantics and
+ behavior of these operations can be determined by investigating the
+ documentation for the corresponding intrinsic. This documentation
+ can be found in
+ - `llvm/docs/AMDGPUUsage.rst` and
+ - The comments of `llvm/include/llvm/IR/IntrinsicsAMDGPU.td`, which
+ is where details of the meaning of certain bitfields or of how an
+ intrinsic corresponds to hardware instructions are most likely to
+ be found.
+
+ Since many intrinsics are themselves minimal wrappers around hardware
+ instructions, these documentation sources often do not repeat hardware
+ documentation. If an intrinsic appears undocumented, information about
+ its behavior will often be available in published ISA descriptions or
+ (sometimes known as shader programming guides).
+
+ If an operation doesn't provide usage examples, it is likely that they
+ can be found in `mlir/test/Dialect/LLVMIR/rocdl.td`.
+ }];
+
let extraClassDeclaration = [{
/// Get the name of the attribute used to annotate external kernel
/// functions.
@@ -130,6 +211,7 @@ class ROCDL_SpecialIdRegisterOp<string mnemonic> :
];
}
+// TODO(krzysz00): This should be a lowering pattern, not an op.
class ROCDL_DimGetterFunctionOp<string mnemonic, string device_function,
int parameter, list<Trait> traits = []> :
ROCDL_Op<mnemonic, !listconcat(traits, [Pure])>,
@@ -322,6 +404,12 @@ def ROCDL_BarrierOp : ROCDL_Op<"barrier"> {
builder.CreateFence(llvm::AtomicOrdering::Acquire,
llvmContext.getOrInsertSyncScopeID("workgroup"));
}];
+ let description = [{
+ An operation with the same expansion as HIP's __synchthreads();
+
+ **DEPRECATION NOTICE**: Use `gpu.barrier`, which will expand to these
+ operations, instead.
+ }];
let assemblyFormat = "attr-dict";
}
``````````
</details>
https://github.com/llvm/llvm-project/pull/172703
More information about the Mlir-commits
mailing list