[Mlir-commits] [mlir] [NFC][mlir][AMDGPU] Partition dialect .td into multiple files (PR #178562)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Jan 28 17:47:28 PST 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir
Author: Krzysztof Drewniak (krzysz00)
<details>
<summary>Changes</summary>
Follow the style of other dialects by having a distiinct .td file for each category of thing (type, attribdut, operation, enum) generated for the AMDGPU dialect.
Nothing has changed, but a lot of things have been copy-pasted.
---
Patch is 166.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/178562.diff
9 Files Affected:
- (modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+5-1794)
- (added) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUAttrs.td (+50)
- (added) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUBase.td (+118)
- (modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h (+1)
- (added) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUEnums.td (+83)
- (added) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUOps.td (+1544)
- (added) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUTypes.td (+72)
- (modified) mlir/include/mlir/Dialect/AMDGPU/IR/CMakeLists.txt (+2-2)
- (modified) mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp (+1-1)
``````````diff
diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index 1f0e5cf7e7f56..94f8d37609230 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -1,4 +1,4 @@
-//===-- AMDGPU.td - AMDGPU dialect definitions *- tablegen -*------===//
+//===-- AMDGPU.td - AMDGPU dialect *- tablegen -*------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
@@ -6,1798 +6,9 @@
//
//===----------------------------------------------------------------------===//
-#ifndef AMDGPU
-#define AMDGPU
+#ifndef MLIR_DIALECT_AMDGPU_IR_AMDGPU_TD
+#define MLIR_DIALECT_AMDGPU_IR_AMDGPU_TD
-include "mlir/Interfaces/InferTypeOpInterface.td"
-include "mlir/Interfaces/SideEffectInterfaces.td"
-include "mlir/Interfaces/ViewLikeInterface.td"
-include "mlir/IR/EnumAttr.td"
-include "mlir/IR/Properties.td"
-include "mlir/IR/OpBase.td"
+include "mlir/Dialect/AMDGPU/IR/AMDGPUOps.td"
-def AMDGPU_Dialect : Dialect {
- let name = "amdgpu";
- let cppNamespace = "::mlir::amdgpu";
- let description = [{
- The `AMDGPU` dialect provides wrappers around AMD-specific functionality
- and LLVM intrinsics. These wrappers should be used in conjunction with
- more generic dialects, such as `gpu` and `vector`, when generating LLVM IR
- that will eventually be executed on AMD hardware.
-
- # What goes here?
- In many cases, AMD GPU functionality can be accessed either though generic
- operations (such as those in the `gpu`, `vector`, or `math`) or through
- the `rocdl` dialect's intrinsic wrappers. However, there are instances where
- AMD-specific functionally benefits from a wrapper around the underlying
- LLVM intrinsics.
-
- In general terms, operations or types should be added to this dialect when they
- wrap some AMD-specific functionality in a way that makes it work better with the
- MLIR ecosystem and its types or when those buitins would be needlessly
- complex to work with (such as if they features magic constants at the LLVM level).
-
- An additional set of operations that belong in this dialect are those that
- have chipset-specific differences that can be abstracted over in a useful way.
-
- To give some concrete examples:
-
- - `amdgpu.mfma` and `amdgpu.wmma` exist in order to make a large set of
- intrinsics more compatible with the MLIR type system (such as by allowing
- 8-bit float vectors to be passed as `vector<N x f8E4M3FN>` or
- `vector<N x f8E4M2>` instead of as packed 32-bit integers whose element type
- is controlled by separate operator-level constants. These operations also
- allow the same `amdgpu.mfma` operation to be used regardless of the target
- chip.
- - `amdgpu.swizzle_bitmode` provides a wrapper around the `ds.swizzle` intrinsic,
- allowing a wider range of types (such as `vector<2xf16>`) to be used natively
- and eliminating the need to pack the and, or, and xor components using opaque
- shifts.
- - Operations like `amdgpu.gather_to_lds` provide `memref`-ized wrappers around
- intrinsics that take a pointer, and are nontrivial enough to justify inclusion
- in this dialect.
-
-
- Note that simple intrinsics like `rocdl.sin` or `rocdl.s.barrier` should not
- receive wrapper operations, as nothing is gained from the duplicate operation.
- As a rule of thumb, if an operation's rewrite in AMDGPUToROCDL would be only
- a `replaceOpWithNewOp` call, no AMDGPU dialect operation is needed.
-
- # Design guidelines
-
- Operations should leverage MLIR's "standard" types where possible. MLIR has
- a more extensible type system than LLVM (especially in the area of small floats)
- and those types should be used to create more ergonomic wrappers. In particular,
- intrinsics that take pointers should have wrappers in this dialect that take
- `memref` arguments and indices.
-
- Operations should use properties or attributes in cases where the underlying
- intrinsic uses `immarg`s (except in cases where that attribute can be represented
- in the type system).
-
- If it is possible to generalize the types of an operation, it should be done.
- For example, the underlying operations for permutations and swizzles always
- take 32-bit operands. Their AMDGPU wrappers can take any type, and will apply
- padding and expansion to multiple instructions as needed. This makes these
- operations easier to target because it hides the bitcasts and extracts
- until the final lowering.
-
- When the underlying operation uses magic constants, those should be presented
- in a more programmer-friendly fashion, such as through enums or though
- using separate arguments that are later combined. (For example, see the
- design of the `amdgpu.dpp` and `amdgpu.fat_raw_buffer_cast` operations.)
-
- If sufficiently similar functionality on multiple hardware generations can be
- encapsulated into a single operation, it should be done. The lowering to
- intrinsics should either throw an error when an unsupported capability is
- used or ignore it. Which of these is two failure modes is more appropriate
- depends on the nature of the feature, but errors are a safe default choice.
-
- # Documentation guidelines
-
- AMDGPU dialect operations should document how any abstractions they introduce
- translate to LLVM intrinsics or hardware operations.
-
- While documenting the semantics of the underlying operations is not required,
- is preferred to provide an overview of the operation's functionality,
- especially in cases where the documentation is widely distributed. Someone
- looking at an AMDGPU dialect operation should be able to generally understand
- what it does and have found the keywords they'll need for more detail.
-
- Operation documentation should include usage examples.
-
- Note that this dialect uses LLVM's gfx numbers to refer to individual
- architectures/chipsets and not product names or codenames.
- }];
-
-
- let dependentDialects = [
- "ROCDL::ROCDLDialect",
- "arith::ArithDialect",
- "gpu::GPUDialect"
- ];
- let useDefaultAttributePrinterParser = 1;
- let useDefaultTypePrinterParser = 1;
-}
-
-def AnyIntegerOrFloat : AnyTypeOf<[AnySignlessInteger, AnyFloat], "Integer or Float">;
-
-def AnyIntegerOrFloatOr1DVector :
- AnyTypeOf<[AnyIntegerOrFloat, FixedVectorOfRankAndType<[1], [AnyIntegerOrFloat]>]>;
-
-//===----------------------------------------------------------------------===//
-// AMDGPU general attribute definitions
-//===----------------------------------------------------------------------===//
-
-def AMDGPU_AddressSpace : I32EnumAttr<"AddressSpace",
- "AMDGPU-specific address spaces",
- [
- I32EnumAttrCase<"FatRawBuffer", 0, "fat_raw_buffer">,
- I32EnumAttrCase<"BufferRsrc", 1, "buffer_rsrc">,
- I32EnumAttrCase<"FatStructuredBuffer", 2, "fat_structured_buffer">,
- ]> {
- let genSpecializedAttr = 0;
- let cppNamespace = "::mlir::amdgpu";
-}
-
-def AMDGPU_AddressSpaceAttr : EnumAttr<AMDGPU_Dialect, AMDGPU_AddressSpace,
- "address_space"> {
- let description = [{
- AMDGPU-specific memory spaces that may not have exact analogues on other
- GPU targets or backends.
-
- - `fat_raw_buffer` is the memory space used when a memref is stored as
- as a "buffer fat pointer" - that is, a buffer resource (that is set up to
- use raw byte-level indexing) along with its offset. The AMDGPU backend
- implements `ptr addrspace(7)` to represent these fat pointers so that
- buffer resources (which allow advanced features like bounds checking or
- cache swizzling) can be used like ordinary LLVM pointers or memrefs.
- See also the `fat_raw_buffer_cast` operation
- - `buffer_rsrc` is the memory space for `ptr addrspace(8)`, representing a
- buffer resource. It should not be used for memrefs, since it does not support
- indexing
- - `fat_structured_buffer` represents `ptr addrspace(9)`, a buffer resource
- that carries both an index and offset field, which are used for complex
- structured indexing that is primarily seen in graphics applications. This
- is also incompatible with the simple indexing model supported by memref.
- }];
- let assemblyFormat = "`<` $value `>`";
-}
-
-//===----------------------------------------------------------------------===//
-// AMDGPU Type definitions
-//===----------------------------------------------------------------------===//
-
-class AMDGPU_Type<string name, string typeMnemonic, list<Trait> traits = []>
- : TypeDef<AMDGPU_Dialect, name, traits> {
- let mnemonic = typeMnemonic;
-}
-
-def AMDGPU_TDMBaseType : AMDGPU_Type<"TDMBase", "tdm_base"> {
- let summary = "Pair of base addresses that move data between LDS and global storage.";
- let description = [{
- This type is opaque and it is used to represent a struct of two addresses.
- One address is in LDS while the other is in global memory.
-
- The value defined by this operation is only intended to be used by
- amdgpu.tdm_make_descriptor.
- }];
- let parameters = (ins "Type":$elementType);
- let builders = [
- TypeBuilderWithInferredContext<(ins "Type":$elementType), [{
- return $_get(elementType.getContext(), elementType);
- }]>
- ];
- let assemblyFormat = "`<` $elementType `>`";
-}
-
-def AMDGPU_TDMGatherBaseType : AMDGPU_Type<"TDMGatherBase", "tdm_gather_base"> {
- let summary = "Pair of base addresses that move data between LDS and global storage.";
- let description = [{
- This type is opaque and it is used to represent a struct of two addresses.
- One address is in LDS while the other is in global memory.
-
- This operation is similar to amdgpu.tdm_make_base but intended to be
- used in gather mode.
-
- The value defined by this operation is only intended to be used by
- amdgpu.tdm_make_gather_descriptor.
- }];
- let parameters = (ins "Type":$elementType, "Type":$indexType);
- let builders = [
- TypeBuilderWithInferredContext<(ins "Type":$elementType, "Type": $indexType), [{
- return $_get(elementType.getContext(), elementType, indexType);
- }]>
- ];
- let assemblyFormat = "`<` $elementType `,` $indexType`>`";
- let genVerifyDecl = 1;
-}
-
-def AMDGPU_TDMDescriptorType : AMDGPU_Type<"TDMDescriptor", "tdm_descriptor"> {
- let summary = "Descriptors used in tensor store/load operations.";
- let description = [{
- This type is opaque and corresponds to the two or four descriptor groups
- used in tensor_load_to_lds or tensor_store_from_lds.
- }];
-}
-
-class AMDGPU_ConcreteVector<Type elem, int length> :
- FixedVectorOfLengthAndType<[length], [elem]>,
- BuildableType<
- "::mlir::VectorType::get({" # length # "} ,"
- # elem.builderCall # ")">;
-
-//===----------------------------------------------------------------------===//
-// AMDGPU Op definitions
-//===----------------------------------------------------------------------===//
-
-class AMDGPU_Op<string mnemonic, list<Trait> traits = []> :
- Op<AMDGPU_Dialect, mnemonic, traits> {}
-
-def AMDGPU_ExtPackedFp8Op :
- AMDGPU_Op<"ext_packed_fp8", [Pure]>,
- Arguments<(ins AnyTypeOf<[F8E5M2FNUZ, F8E4M3FNUZ, F8E5M2, F8E4M3FN,
- VectorOfLengthAndType<[1, 2, 3, 4], [F8E5M2FNUZ, F8E4M3FNUZ, F8E5M2, F8E4M3FN]>]>:$source,
- ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<3>]>:$index)>,
- Results<(outs AnyTypeOf<[F32, FixedVectorOfLengthAndType<[2], [F32]>]>:$res)> {
- let summary = "Extend a fp8 value to a float or a vector of packed fp8 values to two floats";
-
- let description = [{
- Extend one or two 8-bit floats in `source[index]` to a 32-bit float or
- two floats and return them.
-
- This rather unusual signature arises from the fact that AMD GPUs cannot
- easily work with sub 32-bit quantities, so the compiler intrinsics for
- extending 8-bit floats (which are, currently, the only way to work with
- this operation) take packed vectors of 4 such floats.
-
- If the passed-in vector has fewer than four elements, or the input is scalar,
- the remaining values in the <4 x i8> will be filled with
- undefined values as needed.
- }];
- let assemblyFormat = [{
- attr-dict $source `[` $index `]` `:` type($source) `to` type($res)
- }];
-}
-
-def AMDGPU_ScaledExtPackedMatrixOp
- : AMDGPU_Op<"scaled_ext_packed_matrix", [Pure, AllShapesMatch<["source", "res"]>]>,
- Arguments<(
- ins AnyTypeOf<[FixedVectorOfShapeAndType<[8], F4E2M1FN>,
- FixedVectorOfShapeAndType<[8], F8E4M3FN>,
- FixedVectorOfShapeAndType<[8], F8E5M2>,
- FixedVectorOfShapeAndType<[16], F6E2M3FN>,
- FixedVectorOfShapeAndType<[16], F6E3M2FN>]>:$source,
- FixedVectorOfShapeAndType<[4], F8E8M0FNU>:$scale,
- ConfinedAttr<I32Attr, [IntIsOneOf<[16, 32]>]>:$blockSize,
- ConfinedAttr<I32Attr, [IntIsOneOf<[0, 16]>]>:$firstScaleLane,
- ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<3>]>:$firstScaleByte)>,
- Results<(
- outs AnyTypeOf<[FixedVectorOfShapeAndType<[8], F32>,
- FixedVectorOfShapeAndType<[8], F16>,
- FixedVectorOfShapeAndType<[8], BF16>,
- FixedVectorOfShapeAndType<[16], F32>,
- FixedVectorOfShapeAndType<[16], F16>,
- FixedVectorOfShapeAndType<[16], BF16>]>:$res)> {
-
- let summary = "Extend a wave-wide matrix of packed floating point values";
-
- let description = [{
- Extend matrix of microfloats (8 or 16 elements per lane) using a set of scales
- that may be stored on other lanes.
-
- The scales applied to the input microfloats are stored in bytes which
- come from the `scales` input provided in a *half* of the wave identified
- by `firstScaleLane`. The bytes used is selected by `firstScaleByte` and depends
- on the type of `source`. The 16 vectors in consecutive lanes starting from
- `firstScaleLane` (which we'll call the scale vectors) will be used by both
- halves of the wave (with lane L reading from L % 16'th scale vector).
-
- When `source` is either F4E2M1FN, F6E2M3FN, or F6E3M2FN each half of the
- wave will use a different byte. The first one being `firstScaleByte` and
- the second one being `firstScaleByte` + 1. When the block size is 32,
- `firstScaleByte` can be either 0 or 2, selecting halves of the scale vectors.
- Lanes 0-15 will read from `firstScaleByte` and lanes 16-31 will read
- from `firstScaleByte` + 1.
-
-
- For example:
- ```mlir
- // Input: 8-element vector of F8E4M3FN, converting to F32
- // Lanes 0-15 read from byte 0, lanes 16-31 read from byte 1
- %result = amdgpu.scaled_ext_packed_matrix %source scale(%scales)
- blockSize(32) firstScaleLane(0) firstScaleByte(0)
- : vector<8xf8E4M3FN>, vector<4xf8E8M0FNU> -> vector<8xf32>
-
- // Input: 16-element vector of F6E2M3FN, converting to F16
- // Lanes 0-15 read from byte 2, lanes 16-31 read from byte 3
- %result = amdgpu.scaled_ext_packed_matrix %source scale(%scales)
- blockSize(32) firstScaleLane(16) firstScaleByte(2)
- : vector<16xf6E2M3FN>, vector<4xf8E8M0FNU> -> vector<16xf16>
- ```
-
- When `source` is either F4E2M1FN, F6E2M3FN, or F6E3M2FN and
- the block size is 16, `firstScaleByte` can be 0 or 1.
- Lanes 0-15 read from the `firstScaleByte`th element of the scale vectors,
- while lanes 16-31 read from `firstScaleByte` + 2.
- For example:
- ```mlir
- // Input: 8-element vector of F8E5M2, converting to BF16
- // Lanes 0-15 read from byte 0, lanes 16-31 read from byte 2 (0+2)
- %result = amdgpu.scaled_ext_packed_matrix %source scale(%scales)
- blockSize(16) firstScaleLane(0) firstScaleByte(0)
- : vector<8xf8E5M2>, vector<4xf8E8M0FNU> -> vector<8xbf16>
-
- // Input: 16-element vector of F6E3M2FN, converting to F32
- // Lanes 0-15 read from byte 1, lanes 16-31 read from byte 3 (1+2)
- %result = amdgpu.scaled_ext_packed_matrix %source scale(%scales)
- blockSize(16) firstScaleLane(16) firstScaleByte(1)
- : vector<16xf6E3M2FN>, vector<4xf8E8M0FNU> -> vector<16xf32>
- ```
-
- Note: the layout for the scales generally mirrors how the WMMA
- instructions use for matrix scales. These selection operands allows
- one to choose portions of the matrix to convert.
-
- When `source` is either F8E4M3FN or F8E5M2 and `blockSize` is 32,
- then the same byte will be used by both halves of the wave.
- In this case, `firstScaleByte` can be any value from 0 to 3.
-
- When `source` is either F8E4M3FN or F8E5M2 and `blockSize` is 16,
- following combinations are allowed:
- * `firstScaleLane(0), firstScaleByte(0)`
- * `firstScaleLane(16), firstScaleByte(2)`
- all other combinations are reserved.
-
- Available on gfx1250+.
- }];
-
- let assemblyFormat = [{
- attr-dict $source
- `scale` `(` $scale `)`
- `blockSize` `(` $blockSize `)`
- `firstScaleLane` `(` $firstScaleLane`)`
- `firstScaleByte` `(` $firstScaleByte `)`
- `:` type($source) `,` type($scale) `->` type($res)
- }];
-
- let hasVerifier = 1;
-
-}
-
-def AMDGPU_ScaledExtPackedOp
- : AMDGPU_Op<"scaled_ext_packed", [Pure]>,
- Arguments<(
- ins AnyTypeOf<[VectorOfLengthAndType<[1, 2, 3, 4], [F8E5M2, F8E4M3FN]>,
- VectorOfLengthAndType<[1, 2, 3, 4, 5, 6, 7, 8],
- [F4E2M1FN]>]>:$source,
- F32:$scale,
- ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<7>]>:$index)>,
- Results<(
- outs AnyTypeOf<[FixedVectorOfLengthAndType<[2], [F32]>,
- FixedVectorOfLengthAndType<[2], [F16]>,
- FixedVectorOfLengthAndType<[2], [BF16]>]>:$res)> {
- let summary = "Extend a vector of packed floating point values";
-
- let description = [{
- Extend and scale two packed floats in `source[index]` to two floats and
- return them.
-
- This rather unusual signature arises from the fact that AMD GPUs cannot
- easily work with sub 32-bit quantities, so the compiler intrinsics for
- extending 8-bit floats (which are, currently, the only way to work with
- this operation) take packed vectors of 2 such floats.
-
- If the passed-in vector has fewer than two elements, or the input is scalar,
- the remaining values in the <2 x i8> will be filled with
- undefined values as needed.
- }];
- let assemblyFormat = [{
- attr-dict $source `[` $index `]` `,` $scale `:` type($source) `to` type($res)
- }];
-}
-
-def AMDGPU_PackedTrunc2xFp8Op :
- AMDGPU_Op<"packed_trunc_2xfp8", [Pure, AttrSizedOperandSegments]>,
- Arguments<(ins F32:$sourceA,
- Optional<F32>:$sourceB,
- ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<1>]>:$wordIndex,
- Optional<FixedVectorOfLengthAndType<[4], [F8E4M3FNUZ, F8E5M2FNUZ, F8E4M3FN, F8E5M2]>>:$existing)>,
- Results<(outs FixedVectorOfLengthAndType<[4], [F8E4M3FNUZ, F8E5M2FNUZ, F8E4M3FN, F8E5M2]>:$res)> {
- let summary = "Round two floats into a packed vector of 8-bit floats";
- let description = [{
- Round the inputs `sourceA` and `sourceB` (which is undefined if not
- specified) into the low or high word (bottom two or top two) elements
- of the returned vector, keeping the other two elements of `existing`
- unchanged if present (or undefined if it was not passed in).
-
- The reason for this odd signature is that AMD GPUs cannot easily work with
- sub-registers, and so the conversion intrinsics (which are currently the
- only way to work with 8-bit float types) take packed vectors of 4 8-bit
- values.
- }];
- let assemblyFormat = [{
- attr-dict $sourceA `,` ($sourceB^):(`undef`)?
- `into` ($existing^):(`undef`)? `[` `word` $wordIndex `]`
- `:` type($sourceA) `to` type($res) (`into` type($existing)^)?
- }];
- let hasVerifier = 1;
-}
-
-def AMDGPU_PackedScaledTruncOp
- : AMDGPU_Op<"packed_scaled_trunc", [Pure]>,
- Arguments<(ins VectorOfLengthAndType<[1, 2], [F32, F16, BF16]>:$source,
- F32:$scale,
- ...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/178562
More information about the Mlir-commits
mailing list