[flang-commits] [flang] [Flang][MLIR][OpenMP] Create a deferred declare target marking process for Bridge.cpp (PR #78502)

via flang-commits flang-commits at lists.llvm.org
Tue Feb 20 09:26:19 PST 2024


https://github.com/agozillon updated https://github.com/llvm/llvm-project/pull/78502

>From a6c5aa2a17bd3f67c477de34724f9a62636c7700 Mon Sep 17 00:00:00 2001
From: Andrew Gozillon <Andrew.Gozillon at amd.com>
Date: Wed, 17 Jan 2024 14:35:55 -0600
Subject: [PATCH 1/5] [Flang][MLIR][OpenMP] Create a deferred declare target
 marking process for Bridge.cpp

This patch seeks to create a process that happens apon module
finalisation for OpenMP, in which a list of operations that had
declare target directives applied to them and were not generated
at the time of processing the original declare target directive are
re-checked to apply the appropriate declare target semantics.

This works by maintaining a vector of declare target related data
inside of the FIR converter, in this case the symbol and the two
relevant unsigned integers representing the enumerators. This
vector is added to via a new function called from Bridge.cpp,
insertDeferredDeclareTargets, which happens prior to the
processing of the directive (similarly to getDeclareTargetFunctionDevice
currently for requires), it effectively checks if the Operation the
declare target directive is applied to currently exists, if it doesn't it
appends to the vector. This is a seperate function to the processing
of the declare target via the overloaded genOMP as we unfortunately
do not have access to the list without passing it through every call,
as the AbstractConverter we pass will not allow access to it (I've seen
no other cases of casting it to a FirConverter, so I opted to not do that).

The list is then processed at the end of the module in the
finalizeOpenMPLowering function in Bridge by calling a
new function markDelayedDeclareTargetFunctions which
marks the latently generated operations. In certain cases,
some still will not be generated, e.g. if an interface is defined,
marked as declare target, but has no definition or usage in the
module then it will not be emitted to the module, so due to
these cases we must silently ignore when an operation has
not been found via it's symbol.

The main use-case for this (although, I imagine there is others) is
for processing interfaces that have been declared in a module with
a declare target directive but do not have their implementation
defined in the same module. For example, inside of a seperate
C++ module that will be linked in. In cases where the interface
is called inside of a target region it'll be marked as used on
device appropriately (although, realistically a user should explicitly
mark it to match the corresponding definition), however, in
cases where it's used in a non-clear manner through
something like a function pointer passed to an external call
we require this explicit marking, which this patch adds
support for (currently will cause the compiler to crash).

An alternative to this approach may be to do it in the implicit marking pass,
by searching for function pointers/addresses used in target regions and
marking the FuncOp's appropriately. But this doesn't seem like the right
approach, we would be ignoring the users explicit markings at that point
if they exist (we'd have no way of saving the data and transferring it to the
pass trivially) in lieu of our own assumptions, and this would also not support
other possible operation types that may need deferred marking.
---
 flang/include/flang/Lower/OpenMP.h            |  17 ++
 flang/lib/Lower/Bridge.cpp                    |  25 ++-
 flang/lib/Lower/OpenMP.cpp                    | 164 ++++++++++++++----
 .../declare-target-deferred-marking.f90       |  60 +++++++
 4 files changed, 231 insertions(+), 35 deletions(-)
 create mode 100644 flang/test/Lower/OpenMP/declare-target-deferred-marking.f90

diff --git a/flang/include/flang/Lower/OpenMP.h b/flang/include/flang/Lower/OpenMP.h
index e6fe7fb2276dea..6e4d661c272409 100644
--- a/flang/include/flang/Lower/OpenMP.h
+++ b/flang/include/flang/Lower/OpenMP.h
@@ -14,6 +14,9 @@
 #define FORTRAN_LOWER_OPENMP_H
 
 #include <cinttypes>
+#include <utility>
+
+#include "llvm/ADT/SmallVector.h"
 
 namespace mlir {
 class Value;
@@ -86,6 +89,20 @@ bool isOpenMPDeviceDeclareTarget(Fortran::lower::AbstractConverter &,
                                  Fortran::semantics::SemanticsContext &,
                                  Fortran::lower::pft::Evaluation &,
                                  const parser::OpenMPDeclarativeConstruct &);
+void gatherDeferredDeclareTargets(
+    Fortran::lower::AbstractConverter &, Fortran::semantics::SemanticsContext &,
+    Fortran::lower::pft::Evaluation &,
+    const parser::OpenMPDeclarativeConstruct &,
+    llvm::SmallVectorImpl<std::tuple<
+        uint32_t /*mlir::omp::DeclareTargetCaptureClause*/, uint32_t,
+        /*mlir::omp::DeclareTargetDeviceType*/ Fortran::semantics::Symbol>> &);
+bool markDelayedDeclareTargetFunctions(
+    mlir::Operation *,
+    llvm::SmallVectorImpl<
+        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+                   Fortran::semantics::Symbol>> &,
+    AbstractConverter &);
 void genOpenMPRequires(mlir::Operation *, const Fortran::semantics::Symbol *);
 
 } // namespace lower
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 2d7f748cefa2d8..9ae3188e005a72 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -2472,6 +2472,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         ompDeviceCodeFound ||
         Fortran::lower::isOpenMPDeviceDeclareTarget(
             *this, bridge.getSemanticsContext(), getEval(), ompDecl);
+    Fortran::lower::gatherDeferredDeclareTargets(
+        *this, bridge.getSemanticsContext(), getEval(), ompDecl,
+        deferredDeclareTarget);
     genOpenMPDeclarativeConstruct(
         *this, localSymbols, bridge.getSemanticsContext(), getEval(), ompDecl);
     builder->restoreInsertionPoint(insertPt);
@@ -4698,8 +4701,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   /// Lower functions contained in a module.
   void lowerMod(Fortran::lower::pft::ModuleLikeUnit &mod) {
-    for (Fortran::lower::pft::FunctionLikeUnit &f : mod.nestedFunctions)
+    for (Fortran::lower::pft::FunctionLikeUnit &f : mod.nestedFunctions) {
       lowerFunc(f);
+    }
   }
 
   void setCurrentPosition(const Fortran::parser::CharBlock &position) {
@@ -5001,6 +5005,14 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// lowering.
   void finalizeOpenMPLowering(
       const Fortran::semantics::Symbol *globalOmpRequiresSymbol) {
+    if (!deferredDeclareTarget.empty()) {
+      bool deferredDeviceFuncFound =
+          Fortran::lower::markDelayedDeclareTargetFunctions(
+              getModuleOp().getOperation(), deferredDeclareTarget, *this);
+      if (!ompDeviceCodeFound)
+        ompDeviceCodeFound = deferredDeviceFuncFound;
+    }
+
     // Set the module attribute related to OpenMP requires directives
     if (ompDeviceCodeFound)
       Fortran::lower::genOpenMPRequires(getModuleOp().getOperation(),
@@ -5057,6 +5069,17 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// intended for device offloading has been detected
   bool ompDeviceCodeFound = false;
 
+  /// Keeps track of symbols defined as declare target that could not be
+  /// processed at the time of lowering the declare target construct, such
+  /// as certain cases where interfaces are declared but not defined within
+  /// a module.
+  llvm::SmallVector<
+      std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+                 uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+                 Fortran::semantics::Symbol>,
+      2>
+      deferredDeclareTarget;
+
   const Fortran::lower::ExprToValueMap *exprValueOverrides{nullptr};
 
   /// Stack of derived type under construction to avoid infinite loops when
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 9397af8b8bd05e..b757088055704a 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -3182,6 +3182,39 @@ static mlir::omp::DeclareTargetDeviceType getDeclareTargetInfo(
   return deviceType;
 }
 
+static void insertDeferredDeclareTargets(
+    Fortran::lower::AbstractConverter &converter,
+    Fortran::semantics::SemanticsContext &semaCtx,
+    Fortran::lower::pft::Evaluation &eval,
+    const Fortran::parser::OpenMPDeclareTargetConstruct &declareTargetConstruct,
+    llvm::SmallVectorImpl<
+        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+                   Fortran::semantics::Symbol>> &deferredDeclareTarget) {
+  llvm::SmallVector<DeclareTargetCapturePair, 0> symbolAndClause;
+  mlir::omp::DeclareTargetDeviceType devType = getDeclareTargetInfo(
+      converter, semaCtx, eval, declareTargetConstruct, symbolAndClause);
+  // Return the device type only if at least one of the targets for the
+  // directive is a function or subroutine
+  mlir::ModuleOp mod = converter.getFirOpBuilder().getModule();
+
+  for (const DeclareTargetCapturePair &symClause : symbolAndClause) {
+    mlir::Operation *op = mod.lookupSymbol(
+        converter.mangleName(std::get<Fortran::semantics::Symbol>(symClause)));
+
+    if (!op) {
+      deferredDeclareTarget.push_back(std::make_tuple(
+          static_cast<
+              std::underlying_type_t<mlir::omp::DeclareTargetCaptureClause>>(
+              std::get<0>(symClause)),
+          static_cast<
+              std::underlying_type_t<mlir::omp::DeclareTargetDeviceType>>(
+              devType),
+          std::get<1>(symClause)));
+    }
+  }
+}
+
 static std::optional<mlir::omp::DeclareTargetDeviceType>
 getDeclareTargetFunctionDevice(
     Fortran::lower::AbstractConverter &converter,
@@ -3200,7 +3233,7 @@ getDeclareTargetFunctionDevice(
     mlir::Operation *op = mod.lookupSymbol(
         converter.mangleName(std::get<Fortran::semantics::Symbol>(symClause)));
 
-    if (mlir::isa<mlir::func::FuncOp>(op))
+    if (mlir::isa_and_nonnull<mlir::func::FuncOp>(op))
       return deviceType;
   }
 
@@ -3945,6 +3978,31 @@ genOMP(Fortran::lower::AbstractConverter &converter,
       atomicConstruct.u);
 }
 
+static void
+markDeclareTarget(mlir::Operation *op,
+                  Fortran::lower::AbstractConverter &converter,
+                  mlir::omp::DeclareTargetCaptureClause captureClause,
+                  mlir::omp::DeclareTargetDeviceType deviceType) {
+  auto declareTargetOp = llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(op);
+  if (!declareTargetOp)
+    fir::emitFatalError(
+        converter.getCurrentLocation(),
+        "Attempt to apply declare target on unsupported operation");
+
+  // The function or global already has a declare target applied to it, very
+  // likely through implicit capture (usage in another declare target
+  // function/subroutine). It should be marked as any if it has been assigned
+  // both host and nohost, else we skip, as there is no change
+  if (declareTargetOp.isDeclareTarget()) {
+    if (declareTargetOp.getDeclareTargetDeviceType() != deviceType)
+      declareTargetOp.setDeclareTarget(mlir::omp::DeclareTargetDeviceType::any,
+                                       captureClause);
+    return;
+  }
+
+  declareTargetOp.setDeclareTarget(deviceType, captureClause);
+}
+
 static void genOMP(Fortran::lower::AbstractConverter &converter,
                    Fortran::lower::SymMap &symTable,
                    Fortran::semantics::SemanticsContext &semaCtx,
@@ -3959,42 +4017,16 @@ static void genOMP(Fortran::lower::AbstractConverter &converter,
   for (const DeclareTargetCapturePair &symClause : symbolAndClause) {
     mlir::Operation *op = mod.lookupSymbol(
         converter.mangleName(std::get<Fortran::semantics::Symbol>(symClause)));
-    // There's several cases this can currently be triggered and it could be
-    // one of the following:
-    // 1) Invalid argument passed to a declare target that currently isn't
-    // captured by a frontend semantic check
-    // 2) The symbol of a valid argument is not correctly updated by one of
-    // the prior passes, resulting in missing symbol information
-    // 3) It's a variable internal to a module or program, that is legal by
-    // Fortran OpenMP standards, but is currently unhandled as they do not
-    // appear in the symbol table as they are represented as allocas
+
+    // Some symbols are deferred until later in the module, these are handled
+    // apon finalization of the module for OpenMP inside of Bridge, so we simply
+    // skip for now.
     if (!op)
-      TODO(converter.getCurrentLocation(),
-           "Missing symbol, possible case of currently unsupported use of "
-           "a program local variable in declare target or erroneous symbol "
-           "information ");
-
-    auto declareTargetOp =
-        llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(op);
-    if (!declareTargetOp)
-      fir::emitFatalError(
-          converter.getCurrentLocation(),
-          "Attempt to apply declare target on unsupported operation");
-
-    // The function or global already has a declare target applied to it, very
-    // likely through implicit capture (usage in another declare target
-    // function/subroutine). It should be marked as any if it has been assigned
-    // both host and nohost, else we skip, as there is no change
-    if (declareTargetOp.isDeclareTarget()) {
-      if (declareTargetOp.getDeclareTargetDeviceType() != deviceType)
-        declareTargetOp.setDeclareTarget(
-            mlir::omp::DeclareTargetDeviceType::any,
-            std::get<mlir::omp::DeclareTargetCaptureClause>(symClause));
       continue;
-    }
 
-    declareTargetOp.setDeclareTarget(
-        deviceType, std::get<mlir::omp::DeclareTargetCaptureClause>(symClause));
+    markDeclareTarget(
+        op, converter,
+        std::get<mlir::omp::DeclareTargetCaptureClause>(symClause), deviceType);
   }
 }
 
@@ -4453,6 +4485,26 @@ bool Fortran::lower::isOpenMPTargetConstruct(
   return llvm::omp::allTargetSet.test(dir);
 }
 
+void Fortran::lower::gatherDeferredDeclareTargets(
+    Fortran::lower::AbstractConverter &converter,
+    Fortran::semantics::SemanticsContext &semaCtx,
+    Fortran::lower::pft::Evaluation &eval,
+    const Fortran::parser::OpenMPDeclarativeConstruct &ompDecl,
+    llvm::SmallVectorImpl<
+        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+                   Fortran::semantics::Symbol>> &deferredDeclareTarget) {
+  std::visit(
+      Fortran::common::visitors{
+          [&](const Fortran::parser::OpenMPDeclareTargetConstruct &ompReq) {
+            insertDeferredDeclareTargets(converter, semaCtx, eval, ompReq,
+                                         deferredDeclareTarget);
+          },
+          [&](const auto &) {},
+      },
+      ompDecl.u);
+}
+
 bool Fortran::lower::isOpenMPDeviceDeclareTarget(
     Fortran::lower::AbstractConverter &converter,
     Fortran::semantics::SemanticsContext &semaCtx,
@@ -4471,6 +4523,50 @@ bool Fortran::lower::isOpenMPDeviceDeclareTarget(
       ompDecl.u);
 }
 
+// In certain cases such as subroutine or function interfaces which declare
+// but do not define or directly call the subroutine or function in the same
+// module, their lowering is delayed until after the declare target construct
+// itself is processed, so there symbol is not within the table.
+//
+// This function will also return true if we encounter any device declare
+// target cases, to satisfy checking if we require the requires attributes
+// on the module.
+bool Fortran::lower::markDelayedDeclareTargetFunctions(
+    mlir::Operation *mod,
+    llvm::SmallVectorImpl<
+        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+                   Fortran::semantics::Symbol>> &deferredDeclareTargets,
+    AbstractConverter &converter) {
+  bool deviceCodeFound = false;
+  if (auto modOp = llvm::dyn_cast<mlir::ModuleOp>(mod)) {
+    for (auto sym : deferredDeclareTargets) {
+      mlir::Operation *op =
+          modOp.lookupSymbol(converter.mangleName(std::get<2>(sym)));
+
+      // Due to interfaces being optionally emitted on usage in a module,
+      // not finding an operation at this point cannot be a hard error, we
+      // simply ignore it for now.
+      if (!op)
+        continue;
+
+      auto devType =
+          static_cast<mlir::omp::DeclareTargetDeviceType>(std::get<1>(sym));
+      if (!deviceCodeFound &&
+          devType != mlir::omp::DeclareTargetDeviceType::host) {
+        deviceCodeFound = true;
+      }
+
+      markDeclareTarget(
+          op, converter,
+          static_cast<mlir::omp::DeclareTargetCaptureClause>(std::get<0>(sym)),
+          devType);
+    }
+  }
+
+  return deviceCodeFound;
+}
+
 void Fortran::lower::genOpenMPRequires(
     mlir::Operation *mod, const Fortran::semantics::Symbol *symbol) {
   using MlirRequires = mlir::omp::ClauseRequires;
diff --git a/flang/test/Lower/OpenMP/declare-target-deferred-marking.f90 b/flang/test/Lower/OpenMP/declare-target-deferred-marking.f90
new file mode 100644
index 00000000000000..1998c3da23af5f
--- /dev/null
+++ b/flang/test/Lower/OpenMP/declare-target-deferred-marking.f90
@@ -0,0 +1,60 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s --check-prefixes ALL,HOST
+!RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-is-device %s -o - | FileCheck %s --check-prefixes ALL
+
+program main
+    use, intrinsic ::  iso_c_binding
+    implicit none
+    interface
+    subroutine any_interface()  bind(c,name="any_interface")
+        use, intrinsic :: iso_c_binding
+        implicit none
+    !$omp declare target enter(any_interface) device_type(any)
+    end subroutine any_interface
+
+    subroutine host_interface()  bind(c,name="host_interface")
+      use, intrinsic :: iso_c_binding
+      implicit none
+   !$omp declare target enter(host_interface) device_type(host)
+    end subroutine host_interface
+
+    subroutine device_interface()  bind(c,name="device_interface")
+        use, intrinsic :: iso_c_binding
+        implicit none
+    !$omp declare target enter(device_interface) device_type(nohost)
+    end subroutine device_interface
+
+    subroutine called_from_target_interface(f1, f2) bind(c,name="called_from_target_interface")
+        use, intrinsic :: iso_c_binding
+        implicit none
+        type(c_funptr),value :: f1
+        type(c_funptr),value :: f2
+    end subroutine called_from_target_interface
+
+    subroutine called_from_host_interface(f1) bind(c,name="called_from_host_interface")
+      use, intrinsic :: iso_c_binding
+      implicit none
+      type(c_funptr),value :: f1
+    end subroutine called_from_host_interface
+
+    subroutine unused_unemitted_interface()  bind(c,name="unused_unemitted_interface")
+      use, intrinsic :: iso_c_binding
+      implicit none
+    !$omp declare target enter(unused_unemitted_interface) device_type(nohost)
+    end subroutine unused_unemitted_interface
+
+    end interface
+
+    CALL called_from_host_interface(c_funloc(host_interface))
+!$omp target
+    CALL called_from_target_interface(c_funloc(any_interface), c_funloc(device_interface))
+!$omp end target
+ end program main
+
+!HOST-LABEL: func.func {{.*}} @host_interface()
+!HOST-SAME: {{.*}}, omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (enter)>{{.*}}
+!ALL-LABEL: func.func {{.*}} @called_from_target_interface(!fir.ref<i64>, !fir.ref<i64>)
+!ALL-SAME: {{.*}}, omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.*}}
+!ALL-LABEL: func.func {{.*}} @any_interface()
+!ALL-SAME: {{.*}}, omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (enter)>{{.*}}
+!ALL-LABEL: func.func {{.*}} @device_interface()
+!ALL-SAME: {{.*}}, omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (enter)>{{.*}}

>From 68a533701683b8378cdaf671119cb7e4177c146b Mon Sep 17 00:00:00 2001
From: Andrew Gozillon <Andrew.Gozillon at amd.com>
Date: Wed, 17 Jan 2024 14:55:49 -0600
Subject: [PATCH 2/5] [NFC] Remove some extra braces clang-format may or may
 not have added

---
 flang/lib/Lower/Bridge.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 9ae3188e005a72..2b2c51d5450058 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -4701,9 +4701,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   /// Lower functions contained in a module.
   void lowerMod(Fortran::lower::pft::ModuleLikeUnit &mod) {
-    for (Fortran::lower::pft::FunctionLikeUnit &f : mod.nestedFunctions) {
+    for (Fortran::lower::pft::FunctionLikeUnit &f : mod.nestedFunctions)
       lowerFunc(f);
-    }
   }
 
   void setCurrentPosition(const Fortran::parser::CharBlock &position) {

>From 9fd5ec34a26dfd99eec769fc674dd251e14965d0 Mon Sep 17 00:00:00 2001
From: agozillon <Andrew.Gozillon at amd.com>
Date: Mon, 19 Feb 2024 10:58:36 -0600
Subject: [PATCH 3/5] [NFC] Update based on review comments

> Moved to using a type-alias for the tuple
> Made names a little more consistent and OpenMP specific
---
 flang/include/flang/Lower/OpenMP.h | 23 +++++++++++------------
 flang/lib/Lower/Bridge.cpp         | 18 +++++++-----------
 flang/lib/Lower/OpenMP.cpp         | 20 ++++++--------------
 3 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/flang/include/flang/Lower/OpenMP.h b/flang/include/flang/Lower/OpenMP.h
index 6e4d661c272409..1b818d0b835030 100644
--- a/flang/include/flang/Lower/OpenMP.h
+++ b/flang/include/flang/Lower/OpenMP.h
@@ -13,11 +13,11 @@
 #ifndef FORTRAN_LOWER_OPENMP_H
 #define FORTRAN_LOWER_OPENMP_H
 
+#include "llvm/ADT/SmallVector.h"
+
 #include <cinttypes>
 #include <utility>
 
-#include "llvm/ADT/SmallVector.h"
-
 namespace mlir {
 class Value;
 class Operation;
@@ -52,6 +52,11 @@ struct Evaluation;
 struct Variable;
 } // namespace pft
 
+using OMPDeferredDeclTarInfo =
+    std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
+               uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
+               Fortran::semantics::Symbol>;
+
 // Generate the OpenMP terminator for Operation at Location.
 mlir::Operation *genOpenMPTerminator(fir::FirOpBuilder &, mlir::Operation *,
                                      mlir::Location);
@@ -89,19 +94,13 @@ bool isOpenMPDeviceDeclareTarget(Fortran::lower::AbstractConverter &,
                                  Fortran::semantics::SemanticsContext &,
                                  Fortran::lower::pft::Evaluation &,
                                  const parser::OpenMPDeclarativeConstruct &);
-void gatherDeferredDeclareTargets(
+void gatherOpenMPDeferredDeclareTargets(
     Fortran::lower::AbstractConverter &, Fortran::semantics::SemanticsContext &,
     Fortran::lower::pft::Evaluation &,
     const parser::OpenMPDeclarativeConstruct &,
-    llvm::SmallVectorImpl<std::tuple<
-        uint32_t /*mlir::omp::DeclareTargetCaptureClause*/, uint32_t,
-        /*mlir::omp::DeclareTargetDeviceType*/ Fortran::semantics::Symbol>> &);
-bool markDelayedDeclareTargetFunctions(
-    mlir::Operation *,
-    llvm::SmallVectorImpl<
-        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
-                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
-                   Fortran::semantics::Symbol>> &,
+    llvm::SmallVectorImpl<OMPDeferredDeclTarInfo> &);
+bool markOpenMPDeferredDeclareTargetFunctions(
+    mlir::Operation *, llvm::SmallVectorImpl<OMPDeferredDeclTarInfo> &,
     AbstractConverter &);
 void genOpenMPRequires(mlir::Operation *, const Fortran::semantics::Symbol *);
 
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 2b2c51d5450058..823a38339b2942 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -2472,9 +2472,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         ompDeviceCodeFound ||
         Fortran::lower::isOpenMPDeviceDeclareTarget(
             *this, bridge.getSemanticsContext(), getEval(), ompDecl);
-    Fortran::lower::gatherDeferredDeclareTargets(
+    Fortran::lower::gatherOpenMPDeferredDeclareTargets(
         *this, bridge.getSemanticsContext(), getEval(), ompDecl,
-        deferredDeclareTarget);
+        ompDeferredDeclareTarget);
     genOpenMPDeclarativeConstruct(
         *this, localSymbols, bridge.getSemanticsContext(), getEval(), ompDecl);
     builder->restoreInsertionPoint(insertPt);
@@ -5004,10 +5004,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// lowering.
   void finalizeOpenMPLowering(
       const Fortran::semantics::Symbol *globalOmpRequiresSymbol) {
-    if (!deferredDeclareTarget.empty()) {
+    if (!ompDeferredDeclareTarget.empty()) {
       bool deferredDeviceFuncFound =
-          Fortran::lower::markDelayedDeclareTargetFunctions(
-              getModuleOp().getOperation(), deferredDeclareTarget, *this);
+          Fortran::lower::markOpenMPDeferredDeclareTargetFunctions(
+              getModuleOp().getOperation(), ompDeferredDeclareTarget, *this);
       if (!ompDeviceCodeFound)
         ompDeviceCodeFound = deferredDeviceFuncFound;
     }
@@ -5072,12 +5072,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// processed at the time of lowering the declare target construct, such
   /// as certain cases where interfaces are declared but not defined within
   /// a module.
-  llvm::SmallVector<
-      std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
-                 uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
-                 Fortran::semantics::Symbol>,
-      2>
-      deferredDeclareTarget;
+  llvm::SmallVector<Fortran::lower::OMPDeferredDeclTarInfo, 2>
+      ompDeferredDeclareTarget;
 
   const Fortran::lower::ExprToValueMap *exprValueOverrides{nullptr};
 
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index b757088055704a..bdc66d306b56af 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -3187,10 +3187,8 @@ static void insertDeferredDeclareTargets(
     Fortran::semantics::SemanticsContext &semaCtx,
     Fortran::lower::pft::Evaluation &eval,
     const Fortran::parser::OpenMPDeclareTargetConstruct &declareTargetConstruct,
-    llvm::SmallVectorImpl<
-        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
-                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
-                   Fortran::semantics::Symbol>> &deferredDeclareTarget) {
+    llvm::SmallVectorImpl<Fortran::lower::OMPDeferredDeclTarInfo>
+        &deferredDeclareTarget) {
   llvm::SmallVector<DeclareTargetCapturePair, 0> symbolAndClause;
   mlir::omp::DeclareTargetDeviceType devType = getDeclareTargetInfo(
       converter, semaCtx, eval, declareTargetConstruct, symbolAndClause);
@@ -4485,15 +4483,12 @@ bool Fortran::lower::isOpenMPTargetConstruct(
   return llvm::omp::allTargetSet.test(dir);
 }
 
-void Fortran::lower::gatherDeferredDeclareTargets(
+void Fortran::lower::gatherOpenMPDeferredDeclareTargets(
     Fortran::lower::AbstractConverter &converter,
     Fortran::semantics::SemanticsContext &semaCtx,
     Fortran::lower::pft::Evaluation &eval,
     const Fortran::parser::OpenMPDeclarativeConstruct &ompDecl,
-    llvm::SmallVectorImpl<
-        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
-                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
-                   Fortran::semantics::Symbol>> &deferredDeclareTarget) {
+    llvm::SmallVectorImpl<OMPDeferredDeclTarInfo> &deferredDeclareTarget) {
   std::visit(
       Fortran::common::visitors{
           [&](const Fortran::parser::OpenMPDeclareTargetConstruct &ompReq) {
@@ -4531,12 +4526,9 @@ bool Fortran::lower::isOpenMPDeviceDeclareTarget(
 // This function will also return true if we encounter any device declare
 // target cases, to satisfy checking if we require the requires attributes
 // on the module.
-bool Fortran::lower::markDelayedDeclareTargetFunctions(
+bool Fortran::lower::markOpenMPDeferredDeclareTargetFunctions(
     mlir::Operation *mod,
-    llvm::SmallVectorImpl<
-        std::tuple<uint32_t /*mlir::omp::DeclareTargetCaptureClause*/,
-                   uint32_t, /*mlir::omp::DeclareTargetDeviceType*/
-                   Fortran::semantics::Symbol>> &deferredDeclareTargets,
+    llvm::SmallVectorImpl<OMPDeferredDeclTarInfo> &deferredDeclareTargets,
     AbstractConverter &converter) {
   bool deviceCodeFound = false;
   if (auto modOp = llvm::dyn_cast<mlir::ModuleOp>(mod)) {

>From 2b97bc4a2985cd836a43c302051b0c7fc93fccc8 Mon Sep 17 00:00:00 2001
From: agozillon <Andrew.Gozillon at amd.com>
Date: Mon, 19 Feb 2024 15:29:25 -0600
Subject: [PATCH 4/5] [NFC] Add documentation for declare target as requested

---
 flang/docs/OpenMP-declare-target.md | 124 ++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)
 create mode 100644 flang/docs/OpenMP-declare-target.md

diff --git a/flang/docs/OpenMP-declare-target.md b/flang/docs/OpenMP-declare-target.md
new file mode 100644
index 00000000000000..d167d9e47c1cd8
--- /dev/null
+++ b/flang/docs/OpenMP-declare-target.md
@@ -0,0 +1,124 @@
+<!--===- docs/OpenMP-declare-target.md
+
+   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+   See https://llvm.org/LICENSE.txt for license information.
+   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+-->
+
+# Introduction to Declare Target
+
+In OpenMP `declare target` is a directive that can be applied to a function or variable (primarily global) to notate to the compiler that it should be generated in a particular devices environment. In essence whether something should be emitted for host or device, or both. An example of its usage for both data and functions can be seen below.
+
+```Fortran
+module test_0
+    integer :: sp = 0
+!$omp declare target link(sp)
+end module test_0
+
+program main
+    use test_0
+!$omp target map(tofrom:sp)
+    sp = 1
+!$omp end target
+end program
+```
+
+In the above example, we created a variable in a seperate module, mark it as `declare target` and then map it, embedding it into the device IR and assigning to it. 
+
+
+```Fortran
+function func_t_device() result(i)
+    !$omp declare target to(func_t_device) device_type(nohost)
+        INTEGER :: I
+        I = 1
+end function func_t_device
+
+program main
+!$omp target
+    call func_t_device()
+!$omp end target
+end program
+```
+
+In the above example, we are stating that a function is required on device utilising `declare target`, and that we will not be utilising it on host, so we are in theory free to remove or ignore it. A user could also in this case, leave off the `declare target` from the function and it would be implicitly marked `declare target any` (for both host and device), as it's been utilised within a target region.
+
+# Declare Target as represented in the OpenMP Dialect
+
+In the OpenMP Dialect `declare target` is not represented by a specific `operation` instead it's a OpenMP dialect specific `attribute` that can be applied to any operation in any dialect. This helps to simplify the utilisation of it, instead of replacing or modifying existing global or function `operations` in a dialect it applies to it as extra metadata that the lowering can use in different ways as is neccesary. 
+
+The `attribute` is composed of multiple fields representing the clauses you would find on the `declare target` directive i.e. device type (`nohost`, `any`, `host`) or the capture clause (`link` or `to`). A small example of `declare target` applied to an Fortran `real` can be found below:
+
+```MLIR
+fir.global internal @_QFEi {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
+    %0 = fir.undefined f32
+    fir.has_value %0 : f32
+}
+```
+
+This would look similar for function style `operations`.
+
+The application and access of this attribute is aided by an OpenMP Dialect MLIR Interface named `DeclareTargetInterface`, which can be utilised on operations to access the appropriate interface functions, e.g.:
+
+```C++
+auto declareTargetGlobal = llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
+declareTargetGlobal.isDeclareTarget();
+```
+
+# Declare Target Fortran OpenMP Lowering
+
+The initial lowering of `declare target` to MLIR for both use-cases is done inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp, however some direct calls to `declare target` related functions from Flang's Bridge in flang/lib/Lower/Bridge.cpp are made.
+
+The marking of operations with the declare target attribute happens in two phases, the second one optional contingent on the first failing to apply the attribute due to the operation not being generated yet, the main notable case this occurs currently is when a Fortran function interface has been marked. 
+
+The initial phase happens when the declare target directive and it's clauses are initially processed, with the primary data gathering for the directive and clause happening in a function called `getDeclareTargetInfo` which is then used to feed the `markDeclareTarget` function which does the actual marking utilising the `DeclareTargetInterface`, if it encounters something that has been marked twice over multiple directives with two differing device types (e.g. `host`, `nohost`), then it will swap the device type to `any`.
+
+Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from Bridge, we are also invoking another function 
+called `gatherOpenMPDeferredDeclareTargets` which gathers information relevant to the application of the `declare target` attribute (symbol that it should be applied to, device type clause, and capture clause) when processing `declare target` declarations, storing the data in a vector that is part of Bridge's instantiation of the `AbstractConverter`. This data is only stored if we encounter a function or variable symbol that does not have an operation instantiated for it yet, unfortunately this cannot happen as part of the initial marking as we must store this data in Bridge and only have access to the abstract version of the converter via the OpenMP lowering. 
+
+This information is used in the second phase, which is a form of deferred processing of the `declare target` marked operations that have delayed generation, this is done via the function `markOpenMPDeferredDeclareTargetFunctions` which is called from Bridge at the end of the lowering process allowing us to mark those where possible. It iterates over the data gathered by `gatherOpenMPDeferredDeclareTargets` checking if any of the recorded symbols have now had their corresponding operations instantiated and applying where possible utilising `markDeclareTarget`.
+However, it must be noted that it is still possible for operations not to be generated for certain symbols, in particular the case of function interfaces that are not directly used or defined within the current module, this means we cannot emit errors in the case of left-over unmarked symbols, these must (and should) be caught by the initial semantic analysis.
+
+NOTE: `declare target` can be applied to implicit `SAVE` attributed variables, however, by default Flang does not represent these as `GlobalOp`'s which means we cannot tag and lower them as `declare target` normally, instead similarly to the way `threadprivate` handles these cases, we raise and initialize the variable as an internal `GlobalOp` and apply the attribute. This occurs in the flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
+
+# Declare Target Transformation Passes for Flang
+
+There are currently two passes within Flang that are related to the processing of `declare target`:
+* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured (called from) in `target` regions or other `declare target` marked functions as `declare target`, it does so recursively, e.g. nested calls will also be implicitly marked. It currently will try to mark things as conservatively as possible, i.e. if captured in a `target` region it will apply `nohost`, unless it encounters something with `host` in which case it will apply the any device type (if it's already `any`, then it's left untouched). Functions are handled similarly, except we utilise the parents device type where possible.   
+* `OMPFunctionFiltering` - This is executed after `OMPMarkDeclareTarget`, and currently only for device, its job is to conservatively remove functions from the module where possible. This helps make sure incompatible code from the host is not lowered for device (although, a user can still self inject incompatible code, but this mechanism allows them to avoid that). Functions with `target` regions in them are preserved as they may be neccesary to maintain (e.g. reverse offloading in the future), otherwise, we will remove any function marked as a `declare target host` function and any uses will be replaced with `undef`'s so that other passes can appropriately clean them up and in the meantime we don't break verification.
+
+While this infrastructure is generally applicable to more than just Flang, we currently only utilise them in the Flang frontend and they are part of the Flang codebase, rather than the OpenMP dialect codebase. 
+
+# Declare Target OpenMP Dialect To LLVM-IR Lowering
+
+The OpenMP dialect lowering of `declare target` is a little unique currently, as it's not an `operation` and is an `attribute` we process it utilising the LLVM Target lowerings `amendOperation`, which occurs immediately after an operation has been lowered to LLVM-IR. As it can be applicable to multiple different operations, we must
+specialise this function for each operation type that we may encounter. Currently this is `GlobalOp`'s and 
+`FuncOp`'s.
+
+In the case where we encounter a `FuncOp` our processing is fairly simple, if we're processing the device code, we will finish up our removal of `host` marked functions, anything we could not remove previously we now remove, e.g. if it had a `target` directive in it (which we need to keep a hold of to this point, to actually outline the `target` kernel for device). This hopefully leaves us with only `any`, `device` or undeterminable functions left in the module to lower further, reducing the possibiltiy of device incompatible code being in the module.
+
+For `GlobalOp`'s, the processing is a little more complex, we currently leverage two OMPIRBuilder functions which we have inherited from Clang and moved to the `OMPIRBuilder` to share across the two compiler frontends `registerTargetGlobalVariable` and `getAddrOfDeclareTargetVar`. These two functions are actually recursive and invoke each other depending on the clauses and options provided to the `OMPIRBuilder` (in particular unified shared memory), but the main functionality they provide is the generation of a new global pointer for device with a "ref_" prefix, and enqueuing metadata generation by the `OMPIRBuilder` at the end of the module, for both host and device that links the newly generated device global pointer and the host pointer together across the two modules (and resulting binaries). 
+
+Two things of note about the `GlobalOp` processing, the first is that similarly to other metadata (e.g. for `TargetOp`) that is shared across both host and device modules, the device needs access to the previously generated host IR file, which is done through another `attribute` applied to the `ModuleOp` by the compiler frontend. The file is loaded in and consumed by the `OMPIRBuilder` to populate it's `OffloadInfoManager` data structures, keeping host and device appropriately synchronised.
+
+The second (and more important to remember) is that as we effectively replace the original LLVM-IR generated for the `declare target` marked `GlobalOp` we have some corrections we need to do for `TargetOp`'s (or other region operations that use them directly) which still refer to the original lowered global operation. This is done via `handleDeclareTargetMapVar` which is invoked as the final function and alteration to the lowered `target` region, it's only invoked for device as it's only required in the case where we have emitted the "ref" pointer , and it effectively replaces all uses of the originally lowered global symbol, with our new global ref pointer's symbol. Currently we do not remove or delete the old symbol, this is due to the fact that the same symbol can be utilised across multiple target regions, if we remove it, we risk breaking lowerings of target regions that will be processed at a later time. To appropriately delete these no longer neccesary symbols we would need a deferred removal process at the end of the module, which is currently not in place. It may be possible to store this information in the OMPIRBuilder and then perform this cleanup process on finalization, but this is open for discussion and implementation still.
+
+# Current Support
+
+For the moment, `declare target` should work for:
+* Marking functions/subroutines and function/subroutine interfaces for generation on host, device or both.
+* Implicit function/subroutine capture for calls emitted in a `target` region or explicitly marked `declare 
+   target` function/subroutine. Note: Calls made via arguments passed to other functions must still be 
+   themselves marked `declare target`, e.g. passing a `C` function pointer and invoking it, then the interface
+   and the `C` function in the other module must be marked `declare target`, with the same type of 
+   marking as indicated by the specification.
+* Marking global variables with `declare target`'s `link` clause and mapping the data to the device data 
+   environment utilising `declare target` (may not work for all types yet, but for scalars and arrays
+   of scalars, it should).
+
+Doesn't work for, or needs further testing for:
+* Marking the following types with `declare target link` (needs further testing):
+    * Descriptor based types, e.g. pointers/allocatables.
+    * Derived types.
+    * Members of derived types (use-case needs legality checking with OpenMP specification).
+* Marking global variables with `declare target`'s `to` clause, a lot of the lowering should exist, but it needs further testing and likely some further changes to fully function.

>From 172dfd76b2355f0ccbf9b90131af4e0ef6564238 Mon Sep 17 00:00:00 2001
From: agozillon <Andrew.Gozillon at amd.com>
Date: Tue, 20 Feb 2024 11:25:22 -0600
Subject: [PATCH 5/5] Fix document build bugs

---
 flang/docs/OpenMP-declare-target.md | 2 +-
 flang/docs/index.md                 | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/flang/docs/OpenMP-declare-target.md b/flang/docs/OpenMP-declare-target.md
index d167d9e47c1cd8..4490f0109cbb4f 100644
--- a/flang/docs/OpenMP-declare-target.md
+++ b/flang/docs/OpenMP-declare-target.md
@@ -49,7 +49,7 @@ In the OpenMP Dialect `declare target` is not represented by a specific `operati
 
 The `attribute` is composed of multiple fields representing the clauses you would find on the `declare target` directive i.e. device type (`nohost`, `any`, `host`) or the capture clause (`link` or `to`). A small example of `declare target` applied to an Fortran `real` can be found below:
 
-```MLIR
+```
 fir.global internal @_QFEi {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
     %0 = fir.undefined f32
     fir.has_value %0 : f32
diff --git a/flang/docs/index.md b/flang/docs/index.md
index d974a3628b01d2..e2c7a4224f2a91 100644
--- a/flang/docs/index.md
+++ b/flang/docs/index.md
@@ -69,6 +69,7 @@ on how to get in touch with us and to learn more about the current status.
    OpenACC-descriptor-management.md
    OpenMP-4.5-grammar.md
    OpenMP-descriptor-management
+   OpenMP-declare-target
    OpenMP-semantics
    OptionComparison
    Overview



More information about the flang-commits mailing list