[flang-commits] [flang] [flang][cuda] Don't allocate managed descriptors for OpenACC data-clause boxes (PR #201957)

Sat Jun 6 17:00:28 PDT 2026

================
@@ -2099,11 +2099,37 @@ struct EmboxOpConversion : public EmboxCommonConversion<fir::EmboxOp> {
   }
 };
 
-/// Return true if any user of \p val is a gpu.launch_func operation,
-/// indicating the descriptor must be in device-accessible memory.
-static bool isUsedByGPULaunchFunc(mlir::Value val) {
+/// Return true if any user of \p val is a CUDA Fortran kernel launch
+/// (a gpu.launch_func carrying the `cuf.proc_attr` attribute set by
+/// CUFLaunchAttachAttr). For such launches the descriptor is passed by
+/// reference to the device kernel, so its storage must be device-accessible
+/// (managed) memory.
+///
+/// OpenACC kernel launches are also represented as gpu.launch_func at this
+/// point, but their box arguments are handled through the OpenACC data-clause
+/// lowering and do not require a device-accessible descriptor; forcing those
+/// into managed memory is unnecessary and leaks the descriptor (it is never
+/// freed). Hence only CUDA Fortran launches are considered here.
+static bool isUsedByCUFKernelLaunch(mlir::Value val) {
----------------
clementval wrote:

I'm not sure if this is actually needed. Ultimately we should only allocate manged descriptor when the original was managed as well. Maybe Zhen change can be tweaked.

https://github.com/llvm/llvm-project/pull/201957