[Mlir-commits] [mlir] [mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (PR #71546)

Fri Nov 10 06:03:10 PST 2023

================
@@ -554,6 +555,95 @@ static IntegerAttr wrapNumericMemorySpace(MLIRContext *ctx, unsigned space) {
   return IntegerAttr::get(IntegerType::get(ctx, 64), space);
 }
 
+/// Generates a symbol with 0-sized array type for dynamic shared memory usage,
+/// or uses existing symbol.
+LLVM::GlobalOp
+getDynamicSharedMemorySymbol(ConversionPatternRewriter &rewriter,
+                             gpu::DynamicSharedMemoryOp op,
+                             const LLVMTypeConverter *typeConverter,
+                             MemRefType memrefType, unsigned alignmentBit) {
+  std::optional<LLVM::GlobalOp> existingGlobalOp;
+
+  LLVM::LLVMFuncOp funcOp = op->getParentOfType<LLVM::LLVMFuncOp>();
+  assert(funcOp && "cannot find llvm.func op");
+
+  gpu::GPUModuleOp moduleOp = funcOp->getParentOfType<gpu::GPUModuleOp>();
+  assert(moduleOp && "cannot find gpu.module op");
+
+  // Use already generated global op if it exists
+  int index = 0;
+  std::string prefix = llvm::formatv("__shmem_{0}", funcOp.getSymName());
+  moduleOp->walk([&](LLVM::GlobalOp globalOp) {
+    if (auto arrayType = dyn_cast<LLVM::LLVMArrayType>(globalOp.getType())) {
+      if (arrayType.getNumElements() == 0) {
+        existingGlobalOp = globalOp;
+        return WalkResult::interrupt();
+      }
+    }
+    if (globalOp.getSymName().startswith(prefix))
+      index++;
----------------
grypp wrote:

@ftynse I have implemented the way you proposed. 

> > As an alternative - I can generate a LLVM::GlobalOp using SymbolTable in the Pass. Then, use it in the pattern. This way guarantees
>
> We are already creating a symbol table in the [passs](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp#L590). So there's no new overhead with this approach, besides maybe inserting a global, this would also make the DynamicShared pattern more efficient as the pattern knows what symbol to use from the start.

Initially, I implemented it this way I proposed, but later, I made a change. The pattern is no longer be a self-sufficient lowering pattern; it would require a Pass to generate a `GlobalOp`. It complicates integration with other compilers based on MLIR, such as IREE ([check how they use these lowerings](https://github.com/openxla/iree/blob/86336293a8066b396537fae117d8549460cd85fd/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToNVVM.cpp#L164)). If another compiler wants to use this pattern, it needs to generate GlobalOp in its repository, leading to some code replication. 

> If going this route: then lazy initialization and DenseSet<StringAttr>?

This data structure won't be large considering the expected low number of `GlobalOps` in the IR. I chose `StringSet` for managing small sets but I'm not as familiar with `DenseSet`

https://github.com/llvm/llvm-project/pull/71546