[Mlir-commits] [mlir] [MLIR][XeGPU] Update XeGPU doc (PR #136155)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Apr 17 09:32:34 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir
Author: Chao Chen (chencha3)
<details>
<summary>Changes</summary>
Update docs for XeGPU dialect, and fix mess-up
---
Full diff: https://github.com/llvm/llvm-project/pull/136155.diff
2 Files Affected:
- (modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td (+48-47)
- (modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td (+14-5)
``````````diff
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
index ab5fb4a4a7de9..f1bed70253ef3 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
@@ -183,53 +183,54 @@ def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
is not present, the default value is [1, 0].
- ### Examples:
- 1. Subgroup level layout:
- ```mlir
- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- In this example, there are 16 work-items per subgroup, and is organized as
- [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
-
- 2. Subgroup level layout with order:
- ```mlir
- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
- ```
- In this example, there are 16 work-items per subgroup, and is organized as
- [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
-
- 3. Subgroup level layout with inst_data
- ```mlir
- #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
- ```
- In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
- which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
- work-item is assigned four 2x2 blocks in a round-robin manner.
-
- 4. Workgroup level layout:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
- arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
- is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
-
- 5. Workgroup level layout with order:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
- ```
- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
- arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
- is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
-
- 6. Workgroup level layout with inst_data:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
- each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
- Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
- unit may result in non-contiguous access.
+ Examples:
+
+ 1. Subgroup level layout:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
+
+ 2. Subgroup level layout with order:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
+
+ 3. Subgroup level layout with inst_data
+ ```mlir
+ #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
+ ```
+ In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
+ which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
+ work-item is assigned four 2x2 blocks in a round-robin manner.
+
+ 4. Workgroup level layout:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+ arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+ is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
+
+ 5. Workgroup level layout with order:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+ ```
+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+ arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+ is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
+
+ 6. Workgroup level layout with inst_data:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
+ each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
+ Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
+ unit may result in non-contiguous access.
}];
let parameters = (ins
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
index 765f218f95d26..fb5a1e6f1db0c 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
@@ -16,11 +16,20 @@ def XeGPU_Dialect : Dialect {
let cppNamespace = "::mlir::xegpu";
let summary = "The XeGPU dialect that models Intel GPU's ISA";
let description = [{
- The XeGPU dialect models Intel Xe ISA semantics but works at vector and
- TensorDesc data type. It provides 1:1 mappings to match Xe instructions
- like DPAS and 2D block load. The matrix size being processed at this level
- exactly matches the hardware instructions or the intrinsic supported by
- the lower-level GPU compiler.
+ The XeGPU dialect closely models a subset of the Xe GPU's ISA, providing an
+ abstraction to support high-performance GEMM code generation. It serves as a
+ bridge dialect in the MLIR gradual lowering process, working with MLIR memref
+ and vector types, and complements the Arith, Math, Vector, and Memref dialects.
+ XeGPU operations are introduced for special Xe instructions not modeled by the
+ LLVM/SPIR-V dialect, such as DPAS and 2D block load and store.
+
+ It supports a tile-based programming model, decomposing the GEMM kernel into
+ large predefined tile sizes at the subgroup and workgroup levels. XeGPU allows
+ the high-level GEMM algorithm to be easily expressed. Underneath, it uses
+ target-specific recipes and hardware features to achieve optimal performance
+ on specific hardware. By decomposing GEMM at submatrix granularity and mapping it
+ to registers, it naturally supports optimizations like fusing with neighboring
+ operations.
}];
let dependentDialects = ["arith::ArithDialect"];
``````````
</details>
https://github.com/llvm/llvm-project/pull/136155
More information about the Mlir-commits
mailing list