[llvm-branch-commits] [flang] [mlir] [Flang][OpenMP] Add pass to replace allocas with device shared memory (PR #161863)

Wed Mar 4 03:59:09 PST 2026

abidh wrote:

> > This is useful for optimization cases when generic is converted to spmd and we need to decide which stores to guard and which not. In that case, if we only let main thread write `setID` and then all other threads get the garbage because it is local to all threads. Making it shared solves that problem.
> 
> I see what you mean. Taking a bit of a look into the SPMD-ization logic my understanding is that the store instructions that are guarded are those that write to shared memory. In this case, the allocation for `setID` would be allocated in private stack memory (it's not used inside of parallel), so OpenMPOpt should not introduce any guards for it.

That is true that `OpenMPOpt` does not guard the write to private memory but, for that to work, it has to prove that a store is a write to private memory. That is quite trivial if this is all inline. But in the presence of distribute callbacks, this becomes difficult as pointers gets passed through pointers in the struct. My experiments showed that it will require invasive changes in `OpenMPOpt` and `Attributor` to implement such an analysis and that may have its own side effects. I found that changing any `pinned` alloca, if it was being used inside distribute, to shared memory provided a simpler solution.

The code in the PR correctly handles argument to functions. My question was if we should apply that logic to things which will become arguments to the distribute callback function later in the lowering.

I am happy if you want to merge this. Just wanted to point out some problems that we may have to fix down the line.

> 
> On the other hand, the array descriptor for `Quad%AngSetPtrArray` / `ASet` would use shared memory because: (i) it's passed to an `llvm.intr.memcpy` call before the parallel region (we conservatively assume any pointer passed to another function might potentially reach a parallel region); and (ii) it's read from within the parallel region.
> 
> I think the problem here is not that we're using the wrong memory spaces, but rather that OpenMPOpt does not add a guard to the `llvm.intr.memcpy` call that initializes the descriptor in shared memory before the parallel region (`AAKernelInfoCallSite::initialize` never adds a guard to an intrinsic, though in this case we should probably do so if the destination pointer of the memory copy is to shared memory -- similar to the `llvm.store` case).
> 
> Let me know @abidh if that analysis makes sense or if I'm missing something.

In my investigations, I added guards around `memcpy` (and some other similar intrinsics) but that does not solve the problem I described above.

https://github.com/llvm/llvm-project/pull/161863