[flang-commits] [flang] [flang][CUF] Limit Flang LICM for operations with symbol operands. (PR #191494)

Slava Zakharin via flang-commits flang-commits at lists.llvm.org
Fri Apr 10 12:08:26 PDT 2026


https://github.com/vzakhari created https://github.com/llvm/llvm-project/pull/191494

There is probably an ordering issue between `CUFDeviceGlobal`
and `OffloadLiveInValueCanonicalization` passes: Flang LICM hoists
`fir.address_of` out of `cuf.kernel`, it is pulled back by
`OffloadLiveInValueCanonicalization`, but the symbol is never added
into the device module because `CUFDeviceGlobal` does not run after.

Changing the passes order may take some time, so this is a temporary
workaround to unblock #191309.

The change is currently NFC.


>From 07906c8aa50fb34c33f8c46bb4efab55bac3956f Mon Sep 17 00:00:00 2001
From: Slava Zakharin <szakharin at nvidia.com>
Date: Fri, 10 Apr 2026 10:55:44 -0700
Subject: [PATCH] [flang][CUF] Limit Flang LICM for operations with symbol
 operands.

There is probably an ordering issue between `CUFDeviceGlobal`
and `OffloadLiveInValueCanonicalization` passes: Flang LICM hoists
`fir.address_of` out of `cuf.kernel`, it is pulled back by
`OffloadLiveInValueCanonicalization`, but the symbol is never added
into the device module because `CUFDeviceGlobal` does not run after.

Changing the passes order may take some time, so this is a temporary
workaround to unblock #191309.

The change is currently NFC.
---
 flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp    |  8 ++++-
 flang/test/Transforms/CUF/cuf-kernel-licm.fir | 31 +++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp b/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp
index a157c4756ba15..860069c6036da 100644
--- a/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp
+++ b/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp
@@ -291,7 +291,13 @@ bool cuf::KernelOp::canMoveOutOf(mlir::Operation *candidate) {
   return !llvm::any_of(candidate->getOperands(),
                        [&](mlir::Value candidateOperand) {
                          return fir::isa_ref_type(candidateOperand.getType());
-                       });
+                       }) &&
+         // Same is true for symbol operands (this has to be revisited,
+         // because this may indicate an issue in ordering between
+         // CUFDeviceGlobal and OffloadLiveInValueCanonicalization passes).
+         !llvm::any_of(candidate->getAttrs(), [&](mlir::NamedAttribute attr) {
+           return mlir::isa_and_present<mlir::SymbolRefAttr>(attr.getValue());
+         });
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/flang/test/Transforms/CUF/cuf-kernel-licm.fir b/flang/test/Transforms/CUF/cuf-kernel-licm.fir
index 7d70b934f321b..88e8dc5ee8ba3 100644
--- a/flang/test/Transforms/CUF/cuf-kernel-licm.fir
+++ b/flang/test/Transforms/CUF/cuf-kernel-licm.fir
@@ -80,3 +80,34 @@ func.func @_QPtest(%arg0: !fir.ref<!fir.array<10xf32>> {cuf.data_attr = #cuf.cud
   }
   return
 }
+
+// -----
+
+// Verify that Pure fir.address_of operation is not hoisted by LICM
+// out of cuf.kernel.
+// CHECK-LABEL: func.func @_QQmain() {
+// CHECK-NOT: fir.address_of
+// CHECK: cuf.kernel
+func.func @_QQmain() {
+  %c100 = arith.constant 100 : index
+  %c4_i32 = arith.constant 4 : i32
+  %c6_i32 = arith.constant 6 : i32
+  %c1 = arith.constant 1 : index
+  %0 = fir.dummy_scope : !fir.dscope
+  %1 = cuf.alloc i32 {bindc_name = "a", data_attr = #cuf.cuda<device>, uniq_name = "_QFEa"} -> !fir.ref<i32>
+  %2 = fir.declare %1 {data_attr = #cuf.cuda<device>, uniq_name = "_QFEa"} : (!fir.ref<i32>) -> !fir.ref<i32>
+  %3 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFEi"}
+  %4 = fir.declare %3 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
+  cuf.kernel<<<*, *>>> (%arg0 : index) = (%c1 : index) to (%c100 : index)  step (%c1 : index) {
+    %5 = fir.convert %arg0 : (index) -> i32
+    fir.store %5 to %4 : !fir.ref<i32>
+    %6 = fir.address_of(@_QQclXb12241cc4e2d1775ec743d7d1b7d73f7) : !fir.ref<!fir.char<1,38>>
+    %7 = fir.convert %6 : (!fir.ref<!fir.char<1,38>>) -> !fir.ref<i8>
+    %8 = fir.call @_FortranAioBeginExternalListOutput(%c6_i32, %7, %c4_i32) fastmath<contract> : (i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
+    %9 = fir.load %2 : !fir.ref<i32>
+    %10 = fir.call @_FortranAioOutputInteger32(%8, %9) fastmath<contract> : (!fir.ref<i8>, i32) -> i1
+    %11 = fir.call @_FortranAioEndIoStatement(%8) fastmath<contract> : (!fir.ref<i8>) -> i32
+    "fir.end"() : () -> ()
+  }
+  return
+}



More information about the flang-commits mailing list