[Mlir-commits] [mlir] a7cdea7 - [mlir][gpu] Add documentation for the new GPU compilation mechanism

Fabian Mora llvmlistbot at llvm.org
Fri Aug 11 17:32:49 PDT 2023


Author: Fabian Mora
Date: 2023-08-12T00:32:41Z
New Revision: a7cdea70095f89b6b43918e10dd66c3dd48d80e2

URL: https://github.com/llvm/llvm-project/commit/a7cdea70095f89b6b43918e10dd66c3dd48d80e2
DIFF: https://github.com/llvm/llvm-project/commit/a7cdea70095f89b6b43918e10dd66c3dd48d80e2.diff

LOG: [mlir][gpu] Add documentation for the new GPU compilation mechanism

Adds documentation to the GPU dialect docs giving a general overview of the new
compilation mechanism introduced in the patch series ending in D154153.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D157461

Added: 
    

Modified: 
    mlir/docs/Dialects/GPU.md

Removed: 
    


################################################################################
diff  --git a/mlir/docs/Dialects/GPU.md b/mlir/docs/Dialects/GPU.md
index 4b138ca23d5d64..8558667ea51ab5 100644
--- a/mlir/docs/Dialects/GPU.md
+++ b/mlir/docs/Dialects/GPU.md
@@ -36,6 +36,105 @@ instead; we chose not to use `alloca`-style approach that would require more
 complex lifetime analysis following the principles of MLIR that promote
 structure and representing analysis results in the IR.
 
+## GPU Compilation
+### Deprecation notice
+The `--gpu-to-(cubin|hsaco)` passes will be deprecated in a future release.
+
+### Compilation overview
+The compilation process in the GPU dialect has two main stages: GPU module
+serialization and offloading operations translation. Together these stages can
+produce GPU binaries and the necessary code to execute them.
+
+An example of how the compilation workflow look is:
+
+```
+mlir-opt example.mlir                   \
+  --pass-pipeline="builtin.module(      \
+    nvvm-attach-target{chip=sm_90 O=3}, \ # Attach an NVVM target to a gpu.module op.
+    gpu.module(convert-gpu-to-nvvm),    \ # Convert GPU to NVVM.
+    gpu-to-llvm,                        \ # Convert GPU to LLVM.
+    gpu-module-to-binary                \ # Serialize GPU modules to binaries.
+  )" -o example-nvvm.mlir
+mlir-translate example-nvvm.mlir        \
+  --mlir-to-llvmir                      \ # Obtain the translated LLVM IR.
+  -o example.ll
+```
+
+### Module serialization
+Attributes implementing the GPU Target Attribute Interface handle the
+serialization process and are called Target attributes. These attributes can be
+attached to GPU Modules indicating the serialization scheme to compile the
+module into a binary string.
+
+The `gpu-module-to-binary` pass searches for all nested GPU modules and
+serializes the module using the target attributes attached to the module,
+producing a binary with an object for every target.
+
+Example:
+```
+// Input:
+gpu.module @kernels [#nvvm.target<chip = "sm_90">, #nvvm.target<chip = "sm_60">] {
+  ...
+}
+// mlir-opt --gpu-module-to-binary:
+gpu.binary @kernels [
+  #gpu.object<#nvvm.target<chip = "sm_90">, "sm_90 cubin">,
+  #gpu.object<#nvvm.target<chip = "sm_60">, "sm_60 cubin">
+]
+```
+
+### Offloading LLVM translation
+Attributes implementing the GPU Offloading LLVM Translation Attribute Interface
+handle the translation of GPU binaries and kernel launches into LLVM
+instructions and are called Offloading attributes. These attributes are
+attached to GPU binary operations.
+
+During the LLVM translation process, GPU binaries get translated using the
+scheme provided by the Offloading attribute, translating the GPU binary into
+LLVM instructions. Meanwhile, Kernel launches are translated by searching the
+appropriate binary and invoking the procedure provided by the Offloading
+attribute in the binary for translating kernel launches into LLVM instructions.
+
+Example:
+```
+// Input:
+// Binary with multiple objects but selecting the second one for embedding.
+gpu.binary @binary <#gpu.select_object<#rocdl.target<chip = "gfx90a">>> [
+    #gpu.object<#nvvm.target, "NVPTX">,
+    #gpu.object<#rocdl.target<chip = "gfx90a">, "AMDGPU">
+  ]
+llvm.func @foo() {
+  ...
+  // Launching a kernel inside the binary.
+  gpu.launch_func @binary::@func blocks in (%0, %0, %0)
+                                 threads in (%0, %0, %0) : i64
+                                 dynamic_shared_memory_size %2
+                                 args(%1 : i32, %1 : i32)
+  ...
+}
+// mlir-translate --mlir-to-llvmir:
+ at binary_bin_cst = internal constant [6 x i8] c"AMDGPU", align 8
+ at binary_func_kernel_name = private unnamed_addr constant [7 x i8] c"func\00", align 1
+...
+define void @foo() {
+  ...
+  %module = call ptr @mgpuModuleLoad(ptr @binary_bin_cst)
+  %kernel = call ptr @mgpuModuleGetFunction(ptr %module, ptr @binary_func_kernel_name)
+  call void @mgpuLaunchKernel(ptr %kernel, ...) ; Launch the kernel
+  ...
+  call void @mgpuModuleUnload(ptr %module)
+  ...
+}
+...
+```
+
+### The binary operation
+From a semantic point of view, GPU binaries allow the implementation of many
+concepts, from simple object files to fat binaries. By default, the binary
+operation uses the `#gpu.select_object` offloading attribute; this attribute
+embeds a single object in the binary as a global string, see the attribute docs
+for more information.
+
 ## Operations
 
 [include "Dialects/GPUOps.md"]


        


More information about the Mlir-commits mailing list