[Mlir-commits] [mlir] [mlir][nvgpu] NVGPU Tutorials (PR #87065)
Guray Ozen
llvmlistbot at llvm.org
Fri Apr 12 11:12:24 PDT 2024
================
@@ -0,0 +1,320 @@
+# RUN: env SUPPORT_LIB=%mlir_cuda_runtime \
+# RUN: %PYTHON %s | FileCheck %s
+
+# ===----------------------------------------------------------------------===//
+# Chapter 5 : Warp Specialized GEMM with Tensor Core
+# ===----------------------------------------------------------------------===//
+#
+# This program exemplifies a GEMM operation for `f32+=f16*f16`, utilizing the
+# Warp Specialized method with a tile size of 128x128x64. The code completely
+# parallelizes the two outermost loops into thread blocks. It launches two Warp
+# Groups (256 threads in total): one for the producer and the other for the consumer.
+# Each group takes a different control-flow. The producer thread group is responsible
+# for loading data into shared memory, while the consumer group executes the Tensor
+# Core GEMM operation and epilogue.
+#
+# for ti in range(M//128): # -> blockIdx.x
+# for tj in range(N//128): # -> blockIdx.y
+# with wg_producer:
+# for tk in range(K//64):
+# TMA_128x64_64x128...
+# with wg_consumer:
+# for tk in range(K//64):
+# MMA_128x128x64...
+# Epilogue..
+#
+# This chapter demonstrates:
+# 2 WG (warpgroups)
+# Producer:
+# 2.1.1 Wait MMA Barrier
+# 2.1.1 Load TMA with TMA barrier
+# 2.1.1 Arrive TMA barrier with txcount
+# Consumer:
+# Loop
+# Wait TMA barrier
+# Performs Tensor Core GEMM 64x128x64 by warpgroup
+# Arrive MMA Barrier
+# Epilogue
+# Store fragmented registers to shared memory
+# Store shared memory to global
+#
+# ===----------------------------------------------------------------------===//
+
+
+from mlir import ir
+from mlir.dialects import gpu, scf, nvgpu, nvvm
+from mlir.extras import types as T
+from tools.nvdsl import *
+import numpy as np
+
+
+PRODUCER_PRIMARY_THREAD = 128 # Producer primary thread
----------------
grypp wrote:
I improve the doc
https://github.com/llvm/llvm-project/pull/87065
More information about the Mlir-commits
mailing list