[flang-commits] [flang] [flang][CUF] Do not repack CUDA device dummy arguments (PR #195950)
via flang-commits
flang-commits at lists.llvm.org
Tue May 5 14:52:06 PDT 2026
https://github.com/khaki3 created https://github.com/llvm/llvm-project/pull/195950
-frepack-arrays (implied by -Ofast) was inserting fir.pack_array / fir.unpack_array for assumed-shape dummy arguments with CUDA data attributes (device, managed, etc.). The repacking allocates a host-side temporary and copies the descriptor, but the data lives in device memory. When the CUF kernel subsequently receives the host descriptor pointer, accessing it from the GPU triggers cudaErrorIllegalAddress.
Skip repacking in needsRepack() for any symbol that carries a CUDA data attribute.
>From 06a201b0b7340447267455e941d6c6f8bee3994f Mon Sep 17 00:00:00 2001
From: Kazuaki Matsumura <kmatsumura at nvidia.com>
Date: Tue, 5 May 2026 14:46:09 -0700
Subject: [PATCH] [flang][CUF] Do not repack CUDA device dummy arguments
-frepack-arrays (implied by -Ofast) was inserting fir.pack_array /
fir.unpack_array for assumed-shape dummy arguments with CUDA data
attributes (device, managed, etc.). The repacking allocates a
host-side temporary and copies the descriptor, but the data lives in
device memory. When the CUF kernel subsequently receives the host
descriptor pointer, accessing it from the GPU triggers
cudaErrorIllegalAddress.
Skip repacking in needsRepack() for any symbol that carries a CUDA
data attribute.
Co-authored-by: Cursor <cursoragent at cursor.com>
---
flang/lib/Lower/ConvertVariable.cpp | 7 ++++
flang/test/Lower/CUDA/cuda-repack-arrays.cuf | 39 ++++++++++++++++++++
2 files changed, 46 insertions(+)
create mode 100644 flang/test/Lower/CUDA/cuda-repack-arrays.cuf
diff --git a/flang/lib/Lower/ConvertVariable.cpp b/flang/lib/Lower/ConvertVariable.cpp
index 59abdb92e33ba..05b3fb4a5d370 100644
--- a/flang/lib/Lower/ConvertVariable.cpp
+++ b/flang/lib/Lower/ConvertVariable.cpp
@@ -1113,6 +1113,13 @@ static bool needsRepack(Fortran::lower::AbstractConverter &converter,
Fortran::semantics::Attr::VOLATILE}))
return false;
+ // CUDA device/managed/unified/shared/pinned arrays must not be repacked
+ // on the host. The repacking would allocate a host-side temporary and
+ // copy the descriptor, but the data lives in device memory, causing
+ // illegal address errors when the kernel tries to access it.
+ if (auto cudaAttr = Fortran::semantics::GetCUDADataAttr(&sym))
+ return false;
+
return true;
}
diff --git a/flang/test/Lower/CUDA/cuda-repack-arrays.cuf b/flang/test/Lower/CUDA/cuda-repack-arrays.cuf
new file mode 100644
index 0000000000000..db14b7543762f
--- /dev/null
+++ b/flang/test/Lower/CUDA/cuda-repack-arrays.cuf
@@ -0,0 +1,39 @@
+! RUN: bbc -emit-hlfir -fcuda -frepack-arrays %s -o - | FileCheck %s
+
+! Verify that -frepack-arrays does not generate fir.pack_array for
+! CUDA device dummy arguments. Repacking a device array on the host
+! would create a host-side temporary whose address is invalid on the
+! GPU, leading to cudaErrorIllegalAddress at kernel launch.
+
+subroutine sub1(db, n1, n2)
+ real, device :: db(:,:)
+ integer :: n1, n2
+ db(1,1) = 1.0
+end subroutine
+
+! CHECK-LABEL: func.func @_QPsub1(
+! CHECK-SAME: %[[ARG0:.*]]: !fir.box<!fir.array<?x?xf32>> {cuf.data_attr = #cuf.cuda<device>
+! CHECK-NOT: fir.pack_array
+! CHECK: return
+
+subroutine sub2(x, n)
+ real :: x(:)
+ integer :: n
+ x(1) = 1.0
+end subroutine
+
+! CHECK-LABEL: func.func @_QPsub2(
+! CHECK: fir.pack_array
+! CHECK: fir.unpack_array
+! CHECK: return
+
+subroutine sub3(db, n)
+ real, managed :: db(:)
+ integer :: n
+ db(1) = 1.0
+end subroutine
+
+! CHECK-LABEL: func.func @_QPsub3(
+! CHECK-SAME: %[[ARG0:.*]]: !fir.box<!fir.array<?xf32>> {cuf.data_attr = #cuf.cuda<managed>
+! CHECK-NOT: fir.pack_array
+! CHECK: return
More information about the flang-commits
mailing list