[Mlir-commits] [mlir] [MLIR][XeGPU] Extend SGMapAttr and Add ConvertLayoutOp (PR #132425)
Chao Chen
llvmlistbot at llvm.org
Tue Apr 8 15:03:56 PDT 2025
================
@@ -154,33 +154,128 @@ def XeGPU_FenceScopeAttr:
let assemblyFormat = "$value";
}
-def XeGPU_SGMapAttr : XeGPUAttr<"SGMap", "sg_map"> {
+def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
let summary = [{
- Describes the mapping between work item (WI) and the 2D tensor specified by the tensor descriptor.
+ Describes the data distribution to subgroups and work-items for a tensor
+ specified by the tensor descriptor.
}];
let description = [{
- To distribute the XeGPU operation to work items, the tensor_desc must be specified with the sg_map
- attribute at the tensor description creation time.
- Within the `sg_map`, `wi_layout` specifies the layout of work items,
- describing the mapping of work items to the tensor.
- wi_layout[0] x wi_layout[1] must be equal to the total number of work items within a subgroup.
- `wi_data` specifies the minimum number of data elements assigned to each work item for a single distribution.
-
- E.g., #xegpu.sg_map<wi_layout = [1, 16], wi_data = [1, 1]>
- In this example, the subgroup has 16 work items in wi_layout=[1, 16],
- each accessing 1 element as specified by wi_data=[1, 1].
-
- `wi_data[0] * wi_data[1]` can be greater than 1, meaning that each work item operates on multiple elements,
- which is eventually lowered to "SIMT-flavor" vector, like SPIR-V vector or llvm vector, or packed to a storage data type.
- The multiple elements indicated by `wi_data` can only be from one dimension and must be contiguous in the memory along either dimension.
+ XeGPU operations use `LayoutAttr` to define how data is distributed across subgroups and work-items.
+ This attribute is specified in tensor descriptors during tensor description creation. `LayoutAttr`
+ includes the following parameters:
+
+ * `sg_layout`: Specifies the total number of subgroups and their layout within a workgroup.
+ It is mandatory for workgroup-level programming. Its presence implies workgroup-level code.
+ * `sg_data`: Defines the data size accessed per subgroup. It is optionally used with `sg_layout`
+ for workgroup-level programming. When it is left empty, the size accessed per subgroup can be
+ derived from the tensor shape and `sg_layout` using the formula:
+ `sg_data[i] = tensor_shape[i] / sg_layout[i]`.
+ * `inst_data`: Specifies the data size that is processed by an instruction. It is optionally
+ used with lane_layout. When it is left empty, the data size per instruction is equivalent to
+ the sg_data for workgroup-level programming or equivalent to tensor shape for subgroup-level
+ programming.
+ * `lane_layout` : Specifies the total number of work-items and their arrangement within a subgroup.
+ It is mandatory for subgroup-level programming and optional for workgroup-level programming.
+ * `lane_data` : Specifies the shape of the tensor fragment that each lane accesses. It defines a single,
+ minimal distribution unit. Processing the entire tensor may require one or more distribution units per
+ hardware instruction.
+ * `order`: Specifies the dimension order used to linearize n-dimensional sg_layout and lane_layout to
+ 1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
+ is not present, the default value is [1, 0].
+
+ ### Examples:
+ 1. Subgroup level layout:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
+
+ 2. Subgroup level layout with order:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
+
+ 3. Subgroup level layout with inst_data
+ ```mlir
+ #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
+ ```
+ In this example, the original problem size is divided into smaller subproblems of size [8, 16],
+ which are further distributed across 16 work-items organized as [[0, 1, 2, ..., 7], [8, 9, ..., 15]].
+ Each work-item is assigned a contiguous 2x2 block.
----------------
chencha3 wrote:
Tried to clarify it a little bit.
https://github.com/llvm/llvm-project/pull/132425
More information about the Mlir-commits
mailing list