[Mlir-commits] [mlir] [mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (PR #71546)
Guray Ozen
llvmlistbot at llvm.org
Fri Nov 10 06:03:10 PST 2023
================
@@ -554,6 +555,95 @@ static IntegerAttr wrapNumericMemorySpace(MLIRContext *ctx, unsigned space) {
return IntegerAttr::get(IntegerType::get(ctx, 64), space);
}
+/// Generates a symbol with 0-sized array type for dynamic shared memory usage,
+/// or uses existing symbol.
+LLVM::GlobalOp
+getDynamicSharedMemorySymbol(ConversionPatternRewriter &rewriter,
+ gpu::DynamicSharedMemoryOp op,
+ const LLVMTypeConverter *typeConverter,
+ MemRefType memrefType, unsigned alignmentBit) {
+ std::optional<LLVM::GlobalOp> existingGlobalOp;
+
+ LLVM::LLVMFuncOp funcOp = op->getParentOfType<LLVM::LLVMFuncOp>();
+ assert(funcOp && "cannot find llvm.func op");
+
+ gpu::GPUModuleOp moduleOp = funcOp->getParentOfType<gpu::GPUModuleOp>();
+ assert(moduleOp && "cannot find gpu.module op");
+
+ // Use already generated global op if it exists
+ int index = 0;
+ std::string prefix = llvm::formatv("__shmem_{0}", funcOp.getSymName());
+ moduleOp->walk([&](LLVM::GlobalOp globalOp) {
+ if (auto arrayType = dyn_cast<LLVM::LLVMArrayType>(globalOp.getType())) {
+ if (arrayType.getNumElements() == 0) {
+ existingGlobalOp = globalOp;
+ return WalkResult::interrupt();
+ }
+ }
+ if (globalOp.getSymName().startswith(prefix))
+ index++;
----------------
grypp wrote:
@ftynse I have implemented the way you proposed.
> > As an alternative - I can generate a LLVM::GlobalOp using SymbolTable in the Pass. Then, use it in the pattern. This way guarantees
>
> We are already creating a symbol table in the [passs](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp#L590). So there's no new overhead with this approach, besides maybe inserting a global, this would also make the DynamicShared pattern more efficient as the pattern knows what symbol to use from the start.
Initially, I implemented it this way I proposed, but later, I made a change. The pattern is no longer be a self-sufficient lowering pattern; it would require a Pass to generate a `GlobalOp`. It complicates integration with other compilers based on MLIR, such as IREE ([check how they use these lowerings](https://github.com/openxla/iree/blob/86336293a8066b396537fae117d8549460cd85fd/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToNVVM.cpp#L164)). If another compiler wants to use this pattern, it needs to generate GlobalOp in its repository, leading to some code replication.
> If going this route: then lazy initialization and DenseSet<StringAttr>?
This data structure won't be large considering the expected low number of `GlobalOps` in the IR. I chose `StringSet` for managing small sets but I'm not as familiar with `DenseSet`
https://github.com/llvm/llvm-project/pull/71546
More information about the Mlir-commits
mailing list