[llvm-branch-commits] [flang] [flang][cuda] Lower device/managed/unified allocation to cuda ops (PR #90526)

Mon Apr 29 20:26:40 PDT 2024

clementval wrote:

> Thank you, Valentin!
> 
> Is it expected that we can have a mix of `fir.alloca` and `fir.cuda_alloc` operations in the device routines (e.g. I suppose `fir::FirOpBuilder::createTemporaryAlloc` can generate `fir.alloca` for a temporary location in device code)? It is not necessarily an issue, I just want to understand whether we will have to handle both operations in the device code.

createTemporaryAlloc will also need to be modified to issue cuda_alloc/cuda_free. I'm still evaluating the extend of the change. fir.alloca are fine in device code as long as they are not device, managed or unified as we can support them with the address space. Note that creating managed or unified variabled in device subprogram is not recommended. 

https://github.com/llvm/llvm-project/pull/90526