[Mlir-commits] [mlir] [mlir] Document GPU dialect layering to capture discussions from a PR (PR #95812)

Mon Jun 17 10:07:12 PDT 2024

https://github.com/krzysz00 created https://github.com/llvm/llvm-project/pull/95812

None

>From f31a3cc89e632771959eb8d6d0d5b03062eb6ac5 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak <Krzysztof.Drewniak at amd.com>
Date: Mon, 17 Jun 2024 17:05:09 +0000
Subject: [PATCH] [mlir] Document GPU dialect layering to capture discussions
 from a PR

---
 mlir/docs/Dialects/GPU.md | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/mlir/docs/Dialects/GPU.md b/mlir/docs/Dialects/GPU.md
index 8a3acc33600a4..c33e16453452f 100644
--- a/mlir/docs/Dialects/GPU.md
+++ b/mlir/docs/Dialects/GPU.md
@@ -12,8 +12,36 @@ manipulations to launch a GPU kernel and provide a simple path towards GPU
 execution from MLIR. It may be targeted, for example, by DSLs using MLIR. The
 dialect uses `gpu` as its canonical prefix.
 
+This dialect also abstracts away primitives commonly available in GPU code, such
+as with `gpu.thread_id` (an operation that returns the ID of threads within
+a thread block/workgroup along a given dimension). While the compilation
+pipelines documented below expect such code to live inside a `gpu.module` and
+`gpu.func`, these intrinsic wrappers may be used outside of this context.
+
+Intrinsic-wrapping operations should not expect that they have a parent of type
+`gpu.func`. However, operations that deal in compiling and launching GPU functions,
+like `gpu.launch_func` or `gpu.binary` may assume that the dialect's full layering
+is being used.
+
 [TOC]
 
+## GPU address spaces
+
+The GPU dialect exposes the `gpu.address_space` attribute, which currently has
+three values: `global`, `workgroup`, and `private`.
+
+These address spaces represent the types of buffer commonly seen in GPU compilation:.
+`global` memory is memory that resides in the GPU's global memory and is commonly
+used for function arguments. `workgroup` memory is a limited, per-workgroup resource:
+all threads in a workgroup/thread block access the same values in `worgroup` memory,
+but cannot access the `workgroup` memory of other workgroups. Finally, `private`
+memory is used to represent `alloca`-like buffers that are private to a sigle thread.
+
+These address spaces may be used as the `memorySpace` attribute on `memref` values,.
+The `gpu.module`/`gpu.func` compilation pipeline will lower such memory space
+usages to the correct address spaces on target platforms. Memory attributions should be
+created with the correct memory space on the memref.
+
 ## Memory attribution
 
 Memory buffers are defined at the function level, either in "gpu.launch" or in
@@ -61,6 +89,11 @@ mlir-translate example-nvvm.mlir        \
   -o example.ll
 ```
 
+This compilation process expects all GPU code to live in a `gpu.module` and
+expects all kernels to be `gpu.func` operations. Non-kernel functions, like
+device library calls, may be defined using `func.func` or other non-GPU dialect
+operations.
+
 ### Default NVVM Compilation Pipeline: gpu-lower-to-nvvm-pipeline
 
 The `gpu-lower-to-nvvm-pipeline` compilation pipeline serves as the default way
@@ -85,9 +118,9 @@ within GPU code execution:
 func.func @main() {
     %c2 = arith.constant 2 : index
     %c1 = arith.constant 1 : index
-    gpu.launch 
-        blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1) 
-        threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) { 
+    gpu.launch
+        blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
+        threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
         gpu.printf "Hello from %d\n" %6 : index
         gpu.terminator
     }