[llvm-branch-commits] [flang] [mlir] [Flang][OpenMP] Add pass to replace allocas with device shared memory (PR #161863)

Sergio Afonso via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Mon Feb 23 06:26:18 PST 2026


skatrak wrote:

Thank you @Meinersbur, @tblah and @abidh for your comments. I'll try to address your concerns here:

> I don't yet understand the necessity of this. When lowering fir.alloca, wouldn't OpenMPToLLVMIRLowering automatically select the appropriate memory space?

One problem with lowering `fir.alloca` to `llvm.alloca` or `omp.alloc_shared_mem` based on context is that we'd be introducing OpenMP-specific logic into the FIR dialect, which isn't great though I imagine there's some precedent of similar things. However, Tom did raise a valid point with the fact that there are places in FIR->LLVM where new `llvm.alloca` are created directly (without any `fir.alloca` involved), and we'd have to patch all those as well.

Similarly, any random dialect's op lowering to `llvm.alloca` would end up having to be patched too, and doing context-dependent `llvm.alloca` translation to OpenMP device shared memory APIs I think we all agree would be entirely out of the question, since we don't want to add a dependency to OpenMP to the LLVM dialect. There's nothing we could do at the OpenMPToLLVMIRTranslation stage, since that can only deal with `omp` dialect operations and discardable attributes there.

> To be usable in indirectably reached regions, the adress mlir::Value has to be passed to it in either case, why is it relevant?

Unfortunately, I don't think I quite understand this concern. Could you elaborate on it?

> There can be temporary allocas for boxes created implicitly in conversion from FIR->LLVM (dialect). See placeInMemoryIfNotGlobalInit() and the lowering for fir::LoadOp in CodeGen.cpp. I wonder if this pass would be better being part of mlir, running after FIR->LLVM dialect conversion on a mixture of LLVM and OpenMP dialect ops.

You're right, nice catch. I actually did notice this while enabling larger apps with this PR stack downstream and had made there the change you suggested. I've just updated the PR to incorporate these updates.

> While looking at UMT, I see that there are cases where a variable is not itself used in parallel, like setID in the example below, but it is used to set a pointer that gets used later before or inside parallel. Such variable may also benefit from being in shared memory.

Thanks for the suggestion Abid, but could you elaborate on what would be the benefit of that change? It seems to me that it is correct in that case to make `setID` local to the main thread in the team, since what the other threads need access to is the updated per-team `ASet` pointer.

https://github.com/llvm/llvm-project/pull/161863


More information about the llvm-branch-commits mailing list