[flang-commits] [flang] eef4b5a - [flang] [cuda] Fix CUDA implicit data transfer entity creation (#139414)
via flang-commits
flang-commits at lists.llvm.org
Mon May 12 10:06:42 PDT 2025
Author: Zhen Wang
Date: 2025-05-12T10:06:39-07:00
New Revision: eef4b5a0cdf102e5035d6d4f1aa5f85b2b787e84
URL: https://github.com/llvm/llvm-project/commit/eef4b5a0cdf102e5035d6d4f1aa5f85b2b787e84
DIFF: https://github.com/llvm/llvm-project/commit/eef4b5a0cdf102e5035d6d4f1aa5f85b2b787e84.diff
LOG: [flang] [cuda] Fix CUDA implicit data transfer entity creation (#139414)
Fixed an issue in `genCUDAImplicitDataTransfer` where creating an
`hlfir::Entity` from a symbol address could fail when the address comes
from a `hlfir.declare` operation. Fix is to check if the address comes
from a `hlfir.declare` operation. If so, use the base value from the
declare op when available. Falling back to the original address
otherwise.
Added:
flang/test/Lower/CUDA/cuda-managed.cuf
Modified:
flang/lib/Lower/Bridge.cpp
Removed:
################################################################################
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 43375e84f21fa..cf9a322680321 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -4778,7 +4778,14 @@ class FirConverter : public Fortran::lower::AbstractConverter {
nbDeviceResidentObject <= 1 &&
"Only one reference to the device resident object is supported");
auto addr = getSymbolAddress(sym);
- hlfir::Entity entity{addr};
+ mlir::Value baseValue;
+ if (auto declareOp =
+ llvm::dyn_cast<hlfir::DeclareOp>(addr.getDefiningOp()))
+ baseValue = declareOp.getBase();
+ else
+ baseValue = addr;
+
+ hlfir::Entity entity{baseValue};
auto [temp, cleanup] =
hlfir::createTempFromMold(loc, builder, entity);
auto needCleanup = fir::getIntIfConstant(cleanup);
diff --git a/flang/test/Lower/CUDA/cuda-managed.cuf b/flang/test/Lower/CUDA/cuda-managed.cuf
new file mode 100644
index 0000000000000..e14bd849670b1
--- /dev/null
+++ b/flang/test/Lower/CUDA/cuda-managed.cuf
@@ -0,0 +1,27 @@
+! RUN: bbc -emit-hlfir -fcuda %s -o - | FileCheck %s
+
+subroutine testr2(N1,N2)
+ real(4), managed :: ai4(N1,N2)
+ real(4), allocatable :: bRefi4(:)
+
+ integer :: i1, i2
+
+ do i2 = 1, N2
+ do i1 = 1, N1
+ ai4(i1,i2) = i1 + N1*(i2-1)
+ enddo
+ enddo
+
+ allocate(bRefi4 (N1))
+ do i1 = 1, N1
+ bRefi4(i1) = (ai4(i1,1)+ai4(i1,N2))*N2/2
+ enddo
+ deallocate(bRefi4)
+
+end subroutine
+
+!CHECK-LABEL: func.func @_QPtestr2
+!CHECK: %[[ALLOC:.*]] = cuf.alloc !fir.array<?x?xf32>, %{{.*}}, %{{.*}} : index, index {bindc_name = "ai4", data_attr = #cuf.cuda<managed>, uniq_name = "_QFtestr2Eai4"} -> !fir.ref<!fir.array<?x?xf32>>
+!CHECK: %[[DECLARE:.*]]:2 = hlfir.declare %[[ALLOC]](%{{.*}}) {data_attr = #cuf.cuda<managed>, uniq_name = "_QFtestr2Eai4"} : (!fir.ref<!fir.array<?x?xf32>>, !fir.shape<2>) -> (!fir.box<!fir.array<?x?xf32>>, !fir.ref<!fir.array<?x?xf32>>)
+!CHECK: %[[DEST:.*]] = hlfir.designate %[[DECLARE]]#0 (%{{.*}}, %{{.*}}) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>
+!CHECK: cuf.data_transfer %{{.*}}#0 to %[[DEST]] {transfer_kind = #cuf.cuda_transfer<host_device>} : !fir.ref<f32>, !fir.ref<f32>
More information about the flang-commits
mailing list