[Mlir-commits] [mlir] 7bb1151 - [Flang][OpenMP] Initial support for integer reduction in worksharing-loop

Mon Jul 25 11:47:46 PDT 2022

Author: Kiran Chandramohan
Date: 2022-07-25T18:47:07Z
New Revision: 7bb1151ba21e26d91ddaa83177bb58b4d1c36710

URL: https://github.com/llvm/llvm-project/commit/7bb1151ba21e26d91ddaa83177bb58b4d1c36710
DIFF: https://github.com/llvm/llvm-project/commit/7bb1151ba21e26d91ddaa83177bb58b4d1c36710.diff

LOG: [Flang][OpenMP] Initial support for integer reduction in worksharing-loop

Lower the Flang parse-tree containing OpenMP reductions to the OpenMP
dialect. The OpenMP dialect models reductions with,
1) A reduction declaration operation that specifies how to initialize, combine,
and atomically combine private reduction variables.
2) The OpenMP operation (like wsloop) that supports reductions has an array of
reduction accumulator variables (operands) and an array attribute of the same
size that points to the reduction declaration to be used for the reduction
accumulation.
3) The OpenMP reduction operation that takes a value and an accumulator.
This operation replaces the original reduction operation in the source.

(1) is implemented by the `createReductionDecl` in OpenMP.cpp,
(2) is implemented while creating the OpenMP operation,
(3) is implemented by the `genOpenMPReduction` function in OpenMP.cpp, and
called from Bridge.cpp. The implementation of (3) is not very robust.

NOTE 1: The patch currently supports only reductions for integer type addition.
NOTE 2: Only supports reduction in the worksharing loop.
NOTE 3: Does not generate atomic combination region.
NOTE 4: Other options for creating the reduction operation include
a) having the reduction operation as a construct containing an assignment
and then handling it appropriately in the Bridge.
b) we can modify `genAssignment` or `genFIR(AssignmentStmt)` in the Bridge to
handle OpenMP reduction but so far we have tried not to mix OpenMP
and non-OpenMP code and this will break that.
I will try (b) in a separate patch.
NOTE 5: OpenMP dialect gained support for reduction with the patches:
D105358, D107343. See https://discourse.llvm.org/t/rfc-openmp-reduction-support/3367
for more details.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D130077

Co-authored-by: Peixin-Qiao <qiaopeixin at huawei.com>

Added: 
    flang/test/Lower/OpenMP/Todo/parallel-reduction.f90
    flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90
    flang/test/Lower/OpenMP/Todo/reduction-and.f90
    flang/test/Lower/OpenMP/Todo/reduction-arrays.f90
    flang/test/Lower/OpenMP/Todo/reduction-derived-type-field.f90
    flang/test/Lower/OpenMP/Todo/reduction-eqv.f90
    flang/test/Lower/OpenMP/Todo/reduction-iand.f90
    flang/test/Lower/OpenMP/Todo/reduction-ieor.f90
    flang/test/Lower/OpenMP/Todo/reduction-ior.f90
    flang/test/Lower/OpenMP/Todo/reduction-max.f90
    flang/test/Lower/OpenMP/Todo/reduction-min.f90
    flang/test/Lower/OpenMP/Todo/reduction-multiply.f90
    flang/test/Lower/OpenMP/Todo/reduction-neqv.f90
    flang/test/Lower/OpenMP/Todo/reduction-or.f90
    flang/test/Lower/OpenMP/Todo/reduction-real.f90
    flang/test/Lower/OpenMP/Todo/reduction-subtract.f90
    flang/test/Lower/OpenMP/wsloop-reduction-int.f90

Modified: 
    flang/include/flang/Lower/OpenMP.h
    flang/lib/Lower/Bridge.cpp
    flang/lib/Lower/OpenMP.cpp
    mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
    mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp

Removed: 
    


################################################################################
diff  --git a/flang/include/flang/Lower/OpenMP.h b/flang/include/flang/Lower/OpenMP.h
index 12cb5f600e414..aa1b43b89eace 100644

--- a/flang/include/flang/Lower/OpenMP.h
+++ b/flang/include/flang/Lower/OpenMP.h
@@ -38,6 +38,8 @@ void genOpenMPDeclarativeConstruct(AbstractConverter &, pft::Evaluation &,
                                    const parser::OpenMPDeclarativeConstruct &);
 int64_t getCollapseValue(const Fortran::parser::OmpClauseList &clauseList);
 void genThreadprivateOp(AbstractConverter &, const pft::Variable &);
+void genOpenMPReduction(AbstractConverter &,
+                        const Fortran::parser::OmpClauseList &clauseList);
 
 } // namespace lower
 } // namespace Fortran

diff  --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index f99eff1c943e6..c8a78e2187707 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1630,11 +1630,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     // no collapse requested.
 
     Fortran::lower::pft::Evaluation *curEval = &getEval();
+    const Fortran::parser::OmpClauseList *loopOpClauseList = nullptr;
     if (ompLoop) {
-      const auto &wsLoopOpClauseList = std::get<Fortran::parser::OmpClauseList>(
+      loopOpClauseList = &std::get<Fortran::parser::OmpClauseList>(
           std::get<Fortran::parser::OmpBeginLoopDirective>(ompLoop->t).t);
       int64_t collapseValue =
-          Fortran::lower::getCollapseValue(wsLoopOpClauseList);
+          Fortran::lower::getCollapseValue(*loopOpClauseList);
 
       curEval = &curEval->getFirstNestedEvaluation();
       for (int64_t i = 1; i < collapseValue; i++) {
@@ -1644,6 +1645,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
     for (Fortran::lower::pft::Evaluation &e : curEval->getNestedEvaluations())
       genFIR(e);
+
+    if (ompLoop)
+      genOpenMPReduction(*this, *loopOpClauseList);
+
     localSymbols.popScope();
     builder->restoreInsertionPoint(insertPt);
   }

diff  --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 94029ac9e6b62..acc6c424cb241 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -698,6 +698,40 @@ genOMP(Fortran::lower::AbstractConverter &converter,
   }
 }
 
+/// Creates an OpenMP reduction declaration and inserts it into the provided
+/// symbol table. The declaration has a constant initializer with the neutral
+/// value `initValue`, and the reduction combiner carried over from `reduce`.
+/// TODO: Generalize this for non-integer types, add atomic region.
+static omp::ReductionDeclareOp createReductionDecl(fir::FirOpBuilder &builder,
+                                                   llvm::StringRef name,
+                                                   mlir::Type type,
+                                                   mlir::Location loc) {
+  OpBuilder::InsertionGuard guard(builder);
+  mlir::ModuleOp module = builder.getModule();
+  mlir::OpBuilder modBuilder(module.getBodyRegion());
+  auto decl = module.lookupSymbol<mlir::omp::ReductionDeclareOp>(name);
+  if (!decl)
+    decl = modBuilder.create<omp::ReductionDeclareOp>(loc, name, type);
+  else
+    return decl;
+
+  builder.createBlock(&decl.initializerRegion(), decl.initializerRegion().end(),
+                      {type}, {loc});
+  builder.setInsertionPointToEnd(&decl.initializerRegion().back());
+  Value init = builder.create<mlir::arith::ConstantOp>(
+      loc, type, builder.getIntegerAttr(type, 0));
+  builder.create<omp::YieldOp>(loc, init);
+
+  builder.createBlock(&decl.reductionRegion(), decl.reductionRegion().end(),
+                      {type, type}, {loc, loc});
+  builder.setInsertionPointToEnd(&decl.reductionRegion().back());
+  mlir::Value op1 = decl.reductionRegion().front().getArgument(0);
+  mlir::Value op2 = decl.reductionRegion().front().getArgument(1);
+  Value addRes = builder.create<mlir::arith::AddIOp>(loc, op1, op2);
+  builder.create<omp::YieldOp>(loc, addRes);
+  return decl;
+}
+
 static mlir::omp::ScheduleModifier
 translateModifier(const Fortran::parser::OmpScheduleModifierType &m) {
   switch (m.v) {
@@ -762,6 +796,21 @@ getSIMDModifier(const Fortran::parser::OmpScheduleClause &x) {
   return mlir::omp::ScheduleModifier::none;
 }
 
+static std::string getReductionName(
+    Fortran::parser::DefinedOperator::IntrinsicOperator intrinsicOp,
+    mlir::Type ty) {
+  std::string reductionName;
+  if (intrinsicOp == Fortran::parser::DefinedOperator::IntrinsicOperator::Add)
+    reductionName = "add_reduction";
+  else
+    reductionName = "other_reduction";
+
+  return (llvm::Twine(reductionName) +
+          (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) +
+          llvm::Twine(ty.getIntOrFloatBitWidth()))
+      .str();
+}
+
 static void genOMP(Fortran::lower::AbstractConverter &converter,
                    Fortran::lower::pft::Evaluation &eval,
                    const Fortran::parser::OpenMPLoopConstruct &loopConstruct) {
@@ -773,6 +822,7 @@ static void genOMP(Fortran::lower::AbstractConverter &converter,
   mlir::Value scheduleChunkClauseOperand, ifClauseOperand;
   mlir::Attribute scheduleClauseOperand, noWaitClauseOperand,
       orderedClauseOperand, orderClauseOperand;
+  SmallVector<Attribute> reductionDeclSymbols;
   Fortran::lower::StatementContext stmtCtx;
   const auto &loopOpClauseList = std::get<Fortran::parser::OmpClauseList>(
       std::get<Fortran::parser::OmpBeginLoopDirective>(loopConstruct.t).t);
@@ -841,6 +891,48 @@ static void genOMP(Fortran::lower::AbstractConverter &converter,
     } else if (const auto &ifClause =
                    std::get_if<Fortran::parser::OmpClause::If>(&clause.u)) {
       ifClauseOperand = getIfClauseOperand(converter, stmtCtx, ifClause);
+    } else if (const auto &reductionClause =
+                   std::get_if<Fortran::parser::OmpClause::Reduction>(
+                       &clause.u)) {
+      omp::ReductionDeclareOp decl;
+      const auto &redOperator{std::get<Fortran::parser::OmpReductionOperator>(
+          reductionClause->v.t)};
+      const auto &objectList{
+          std::get<Fortran::parser::OmpObjectList>(reductionClause->v.t)};
+      if (const auto &redDefinedOp =
+              std::get_if<Fortran::parser::DefinedOperator>(&redOperator.u)) {
+        const auto &intrinsicOp{
+            std::get<Fortran::parser::DefinedOperator::IntrinsicOperator>(
+                redDefinedOp->u)};
+        if (intrinsicOp !=
+            Fortran::parser::DefinedOperator::IntrinsicOperator::Add)
+          TODO(currentLocation,
+               "Reduction of some intrinsic operators is not supported");
+        for (const auto &ompObject : objectList.v) {
+          if (const auto *name{
+                  Fortran::parser::Unwrap<Fortran::parser::Name>(ompObject)}) {
+            if (const auto *symbol{name->symbol}) {
+              mlir::Value symVal = converter.getSymbolAddress(*symbol);
+              mlir::Type redType =
+                  symVal.getType().cast<fir::ReferenceType>().getEleTy();
+              reductionVars.push_back(symVal);
+              if (redType.isIntOrIndex()) {
+                decl = createReductionDecl(
+                    firOpBuilder, getReductionName(intrinsicOp, redType),
+                    redType, currentLocation);
+              } else {
+                TODO(currentLocation,
+                     "Reduction of some types is not supported");
+              }
+              reductionDeclSymbols.push_back(SymbolRefAttr::get(
+                  firOpBuilder.getContext(), decl.sym_name()));
+            }
+          }
+        }
+      } else {
+        TODO(currentLocation,
+             "Reduction of intrinsic procedures is not supported");
+      }
     }
   }
 
@@ -873,7 +965,11 @@ static void genOMP(Fortran::lower::AbstractConverter &converter,
   // 2. order
   auto wsLoopOp = firOpBuilder.create<mlir::omp::WsLoopOp>(
       currentLocation, lowerBound, upperBound, step, linearVars, linearStepVars,
-      reductionVars, /*reductions=*/nullptr,
+      reductionVars,
+      reductionDeclSymbols.empty()
+          ? nullptr
+          : mlir::ArrayAttr::get(firOpBuilder.getContext(),
+                                 reductionDeclSymbols),
       scheduleClauseOperand.dyn_cast_or_null<omp::ClauseScheduleKindAttr>(),
       scheduleChunkClauseOperand, /*schedule_modifiers=*/nullptr,
       /*simd_modifier=*/nullptr,
@@ -1410,3 +1506,82 @@ void Fortran::lower::genOpenMPDeclarativeConstruct(
       },
       ompDeclConstruct.u);
 }
+
+// Generate an OpenMP reduction operation. This implementation finds the chain :
+// load reduction var -> reduction_operation -> store reduction var and replaces
+// it with the reduction operation.
+// TODO: Currently assumes it is an integer addition reduction. Generalize this
+// for various reduction operation types.
+// TODO: Generate the reduction operation during lowering instead of creating
+// and removing operations since this is not a robust approach. Also, removing
+// ops in the builder (instead of a rewriter) is probably not the best approach.
+void Fortran::lower::genOpenMPReduction(
+    Fortran::lower::AbstractConverter &converter,
+    const Fortran::parser::OmpClauseList &clauseList) {
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+  for (const auto &clause : clauseList.v) {
+    if (const auto &reductionClause =
+            std::get_if<Fortran::parser::OmpClause::Reduction>(&clause.u)) {
+      const auto &redOperator{std::get<Fortran::parser::OmpReductionOperator>(
+          reductionClause->v.t)};
+      const auto &objectList{
+          std::get<Fortran::parser::OmpObjectList>(reductionClause->v.t)};
+      if (auto reductionOp =
+              std::get_if<Fortran::parser::DefinedOperator>(&redOperator.u)) {
+        const auto &intrinsicOp{
+            std::get<Fortran::parser::DefinedOperator::IntrinsicOperator>(
+                reductionOp->u)};
+        if (intrinsicOp !=
+            Fortran::parser::DefinedOperator::IntrinsicOperator::Add)
+          continue;
+        for (const auto &ompObject : objectList.v) {
+          if (const auto *name{
+                  Fortran::parser::Unwrap<Fortran::parser::Name>(ompObject)}) {
+            if (const auto *symbol{name->symbol}) {
+              mlir::Value symVal = converter.getSymbolAddress(*symbol);
+              mlir::Type redType =
+                  symVal.getType().cast<fir::ReferenceType>().getEleTy();
+              if (!redType.isIntOrIndex())
+                continue;
+              for (mlir::OpOperand &use1 : symVal.getUses()) {
+                if (auto load = mlir::dyn_cast<fir::LoadOp>(use1.getOwner())) {
+                  mlir::Value loadVal = load.getRes();
+                  for (mlir::OpOperand &use2 : loadVal.getUses()) {
+                    if (auto add = mlir::dyn_cast<mlir::arith::AddIOp>(
+                            use2.getOwner())) {
+                      mlir::Value addRes = add.getResult();
+                      for (mlir::OpOperand &use3 : addRes.getUses()) {
+                        if (auto store =
+                                mlir::dyn_cast<fir::StoreOp>(use3.getOwner())) {
+                          if (store.getMemref() == symVal) {
+                            // Chain found! Now replace load->reduction->store
+                            // with the OpenMP reduction operation.
+                            mlir::OpBuilder::InsertPoint insertPtDel =
+                                firOpBuilder.saveInsertionPoint();
+                            firOpBuilder.setInsertionPoint(add);
+                            if (add.getLhs() == loadVal) {
+                              firOpBuilder.create<mlir::omp::ReductionOp>(
+                                  add.getLoc(), add.getRhs(), symVal);
+                            } else {
+                              firOpBuilder.create<mlir::omp::ReductionOp>(
+                                  add.getLoc(), add.getLhs(), symVal);
+                            }
+                            store.erase();
+                            add.erase();
+                            load.erase();
+                            firOpBuilder.restoreInsertionPoint(insertPtDel);
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}

diff  --git a/flang/test/Lower/OpenMP/Todo/parallel-reduction.f90 b/flang/test/Lower/OpenMP/Todo/parallel-reduction.f90
new file mode 100644
index 0000000000000..54b58ada8ab09
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/parallel-reduction.f90
@@ -0,0 +1,11 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: OpenMP Block construct clauses
+subroutine reduction_parallel
+  integer :: x
+  !$omp parallel reduction(+:x)
+  x = x + i
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90 b/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90
new file mode 100644
index 0000000000000..09aba6920232a
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90
@@ -0,0 +1,21 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some types is not supported
+subroutine reduction_allocatable
+  integer, allocatable :: x
+  integer :: i = 1
+
+  allocate(x)
+  x = 0
+
+  !$omp parallel num_threads(4)
+  !$omp do reduction(+:x)
+  do i = 1, 10
+    x = x + i
+  enddo
+  !$omp end do
+  !$omp end parallel
+
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-and.f90 b/flang/test/Lower/OpenMP/Todo/reduction-and.f90
new file mode 100644
index 0000000000000..0dc34635211df
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-and.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_and(y)
+  logical :: x, y(100)
+  !$omp parallel
+  !$omp do reduction(.and.:x)
+  do i=1, 100
+    x = x .and. y(i)
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 b/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90
new file mode 100644
index 0000000000000..a21611faf248c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some types is not supported
+subroutine reduction_array(y)
+  integer :: x(100), y(100,100)
+  !$omp parallel
+  !$omp do reduction(+:x)
+  do i=1, 100
+    x = x + y(:,i)
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-derived-type-field.f90 b/flang/test/Lower/OpenMP/Todo/reduction-derived-type-field.f90
new file mode 100644
index 0000000000000..8bded2fdb7469
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-derived-type-field.f90
@@ -0,0 +1,21 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some types is not supported
+subroutine reduction_allocatable
+  type t
+    integer :: x
+  end type
+  integer :: i = 1
+  type(t) :: mt
+
+  mt%x = 0
+
+  !$omp parallel num_threads(4)
+  !$omp do reduction(+:mt)
+  do i = 1, 10
+    mt%x = mt%x + i
+  enddo
+  !$omp end do
+  !$omp end parallel
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-eqv.f90 b/flang/test/Lower/OpenMP/Todo/reduction-eqv.f90
new file mode 100644
index 0000000000000..36322e9156c6c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-eqv.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_eqv(y)
+  logical :: x, y(100)
+  !$omp parallel
+  !$omp do reduction(.eqv.:x)
+  do i=1, 100
+    x = x .eqv. y(i)
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-iand.f90 b/flang/test/Lower/OpenMP/Todo/reduction-iand.f90
new file mode 100644
index 0000000000000..f1b9b025d2a16
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-iand.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of intrinsic procedures is not supported
+subroutine reduction_iand(y)
+  integer :: x, y(:)
+  x = 0
+  !$omp parallel
+  !$omp do reduction(iand:x)
+  do i=1, 100
+    x = iand(x, y(i))
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-ieor.f90 b/flang/test/Lower/OpenMP/Todo/reduction-ieor.f90
new file mode 100644
index 0000000000000..2303a8edfaa94
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-ieor.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of intrinsic procedures is not supported
+subroutine reduction_ieor(y)
+  integer :: x, y(:)
+  x = 0
+  !$omp parallel
+  !$omp do reduction(ieor:x)
+  do i=1, 100
+    x = ieor(x, y(i))
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-ior.f90 b/flang/test/Lower/OpenMP/Todo/reduction-ior.f90
new file mode 100644
index 0000000000000..3460df2b9d6d2
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-ior.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of intrinsic procedures is not supported
+subroutine reduction_ior(y)
+  integer :: x, y(:)
+  x = 0
+  !$omp parallel
+  !$omp do reduction(ior:x)
+  do i=1, 100
+    x = ior(x, y(i))
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-max.f90 b/flang/test/Lower/OpenMP/Todo/reduction-max.f90
new file mode 100644
index 0000000000000..e965e6860712e
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-max.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of intrinsic procedures is not supported
+subroutine reduction_max(y)
+  integer :: x, y(:)
+  x = 0
+  !$omp parallel
+  !$omp do reduction(max:x)
+  do i=1, 100
+    x = max(x, y(i))
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-min.f90 b/flang/test/Lower/OpenMP/Todo/reduction-min.f90
new file mode 100644
index 0000000000000..9880b6eb153bb
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-min.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of intrinsic procedures is not supported
+subroutine reduction_min(y)
+  integer :: x, y(:)
+  x = 0
+  !$omp parallel
+  !$omp do reduction(min:x)
+  do i=1, 100
+    x = min(x, y(i))
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-multiply.f90 b/flang/test/Lower/OpenMP/Todo/reduction-multiply.f90
new file mode 100644
index 0000000000000..3f7f2ef96f448
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-multiply.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_multiply
+  integer :: x
+  !$omp parallel
+  !$omp do reduction(*:x)
+  do i=1, 100
+    x = x * i
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-neqv.f90 b/flang/test/Lower/OpenMP/Todo/reduction-neqv.f90
new file mode 100644
index 0000000000000..8a0e82c1637e4
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-neqv.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_neqv(y)
+  logical :: x, y(100)
+  !$omp parallel
+  !$omp do reduction(.neqv.:x)
+  do i=1, 100
+    x = x .neqv. y(i)
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-or.f90 b/flang/test/Lower/OpenMP/Todo/reduction-or.f90
new file mode 100644
index 0000000000000..f88428b93c31e
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-or.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_or(y)
+  logical :: x, y(100)
+  !$omp parallel
+  !$omp do reduction(.or.:x)
+  do i=1, 100
+    x = x .or. y(i)
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-real.f90 b/flang/test/Lower/OpenMP/Todo/reduction-real.f90
new file mode 100644
index 0000000000000..4b1a25345f15a
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-real.f90
@@ -0,0 +1,16 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some types is not supported
+subroutine reduction_real
+  real :: x
+  x = 0.0
+  !$omp parallel
+  !$omp do reduction(+:x)
+  do i=1, 100
+    x = x + 1.0
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/Todo/reduction-subtract.f90 b/flang/test/Lower/OpenMP/Todo/reduction-subtract.f90
new file mode 100644
index 0000000000000..bd01e4f63b983
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/reduction-subtract.f90
@@ -0,0 +1,15 @@
+! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: not yet implemented: Reduction of some intrinsic operators is not supported
+subroutine reduction_subtract
+  integer :: x
+  !$omp parallel
+  !$omp do reduction(-:x)
+  do i=1, 100
+    x = x - i
+  end do
+  !$omp end do
+  !$omp end parallel
+  print *, x
+end subroutine

diff  --git a/flang/test/Lower/OpenMP/wsloop-reduction-int.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-int.f90
new file mode 100644
index 0000000000000..bd7102d725250
--- /dev/null
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-int.f90
@@ -0,0 +1,144 @@
+! RUN: bbc -emit-fir -fopenmp %s -o - | FileCheck %s
+! RUN: %flang_fc1 -emit-fir -fopenmp %s -o - | FileCheck %s
+
+!CHECK-LABEL: omp.reduction.declare
+!CHECK-SAME: @[[RED_I64_NAME:.*]] : i64 init {
+!CHECK: ^bb0(%{{.*}}: i64):
+!CHECK:  %[[C0_1:.*]] = arith.constant 0 : i64
+!CHECK:  omp.yield(%[[C0_1]] : i64)
+!CHECK: } combiner {
+!CHECK: ^bb0(%[[ARG0:.*]]: i64, %[[ARG1:.*]]: i64):
+!CHECK:  %[[RES:.*]] = arith.addi %[[ARG0]], %[[ARG1]] : i64
+!CHECK:  omp.yield(%[[RES]] : i64)
+!CHECK: }
+
+!CHECK-LABEL: omp.reduction.declare
+!CHECK-SAME: @[[RED_I32_NAME:.*]] : i32 init {
+!CHECK: ^bb0(%{{.*}}: i32):
+!CHECK:  %[[C0_1:.*]] = arith.constant 0 : i32
+!CHECK:  omp.yield(%[[C0_1]] : i32)
+!CHECK: } combiner {
+!CHECK: ^bb0(%[[ARG0:.*]]: i32, %[[ARG1:.*]]: i32):
+!CHECK:  %[[RES:.*]] = arith.addi %[[ARG0]], %[[ARG1]] : i32
+!CHECK:  omp.yield(%[[RES]] : i32)
+!CHECK: }
+
+!CHECK-LABEL: func.func @_QPsimple_reduction
+!CHECK:  %[[XREF:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFsimple_reductionEx"}
+!CHECK:  %[[C0_2:.*]] = arith.constant 0 : i32
+!CHECK:  fir.store %[[C0_2]] to %[[XREF]] : !fir.ref<i32>
+!CHECK:  omp.parallel
+!CHECK:    %[[I_PVT_REF:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    %[[C1_1:.*]] = arith.constant 1 : i32
+!CHECK:    %[[C100:.*]] = arith.constant 100 : i32
+!CHECK:    %[[C1_2:.*]] = arith.constant 1 : i32
+!CHECK:    omp.wsloop   reduction(@[[RED_I32_NAME]] -> %[[XREF]] : !fir.ref<i32>) for  (%[[IVAL:.*]]) : i32 = (%[[C1_1]]) to (%[[C100]]) inclusive step (%[[C1_2]])
+!CHECK:      fir.store %[[IVAL]] to %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      %[[I_PVT_VAL:.*]] = fir.load %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      omp.reduction %[[I_PVT_VAL]], %[[XREF]] : !fir.ref<i32>
+!CHECK:      omp.yield
+!CHECK:    omp.terminator
+!CHECK:  return
+
+subroutine simple_reduction
+  integer :: x
+  x = 0
+  !$omp parallel
+  !$omp do reduction(+:x)
+  do i=1, 100
+    x = x + i
+  end do
+  !$omp end do
+  !$omp end parallel
+end subroutine
+
+!CHECK-LABEL: func.func @_QPsimple_reduction_switch_order
+!CHECK:  %[[XREF:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFsimple_reduction_switch_orderEx"}
+!CHECK:  %[[C0_2:.*]] = arith.constant 0 : i32
+!CHECK:  fir.store %[[C0_2]] to %[[XREF]] : !fir.ref<i32>
+!CHECK:  omp.parallel
+!CHECK:    %[[I_PVT_REF:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    %[[C1_1:.*]] = arith.constant 1 : i32
+!CHECK:    %[[C100:.*]] = arith.constant 100 : i32
+!CHECK:    %[[C1_2:.*]] = arith.constant 1 : i32
+!CHECK:    omp.wsloop   reduction(@[[RED_I32_NAME]] -> %[[XREF]] : !fir.ref<i32>) for  (%[[IVAL:.*]]) : i32 = (%[[C1_1]]) to (%[[C100]]) inclusive step (%[[C1_2]])
+!CHECK:      fir.store %[[IVAL]] to %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      %[[I_PVT_VAL:.*]] = fir.load %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      omp.reduction %[[I_PVT_VAL]], %[[XREF]] : !fir.ref<i32>
+!CHECK:      omp.yield
+!CHECK:    omp.terminator
+!CHECK:  return
+
+subroutine simple_reduction_switch_order
+  integer :: x
+  x = 0
+  !$omp parallel
+  !$omp do reduction(+:x)
+  do i=1, 100
+    x = i + x
+  end do
+  !$omp end do
+  !$omp end parallel
+end subroutine
+
+!CHECK-LABEL: func.func @_QPmultiple_reductions_same_type
+!CHECK:  %[[XREF:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFmultiple_reductions_same_typeEx"}
+!CHECK:  %[[YREF:.*]] = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFmultiple_reductions_same_typeEy"}
+!CHECK:  %[[ZREF:.*]] = fir.alloca i32 {bindc_name = "z", uniq_name = "_QFmultiple_reductions_same_typeEz"}
+!CHECK:  omp.parallel
+!CHECK:    %[[I_PVT_REF:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    omp.wsloop   reduction(@[[RED_I32_NAME]] -> %[[XREF]] : !fir.ref<i32>, @[[RED_I32_NAME]] -> %[[YREF]] : !fir.ref<i32>, @[[RED_I32_NAME]] -> %[[ZREF]] : !fir.ref<i32>) for  (%[[IVAL]]) : i32
+!CHECK:      fir.store %[[IVAL]] to %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      %[[I_PVT_VAL1:.*]] = fir.load %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      omp.reduction %[[I_PVT_VAL1]], %[[XREF]] : !fir.ref<i32>
+!CHECK:      %[[I_PVT_VAL2:.*]] = fir.load %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      omp.reduction %[[I_PVT_VAL2]], %[[YREF]] : !fir.ref<i32>
+!CHECK:      %[[I_PVT_VAL3:.*]] = fir.load %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      omp.reduction %[[I_PVT_VAL3]], %[[ZREF]] : !fir.ref<i32>
+!CHECK:      omp.yield
+!CHECK:    omp.terminator
+!CHECK:  return
+
+subroutine multiple_reductions_same_type
+  integer :: x,y,z
+  x = 0
+  y = 0
+  z = 0
+  !$omp parallel
+  !$omp do reduction(+:x,y,z)
+  do i=1, 100
+    x = x + i
+    y = y + i
+    z = z + i
+  end do
+  !$omp end do
+  !$omp end parallel
+end subroutine
+
+!CHECK-LABEL: func.func @_QPmultiple_reductions_
diff erent_type
+!CHECK:  %[[XREF:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFmultiple_reductions_
diff erent_typeEx"}
+!CHECK:  %[[YREF:.*]] = fir.alloca i64 {bindc_name = "y", uniq_name = "_QFmultiple_reductions_
diff erent_typeEy"}
+!CHECK:  omp.parallel
+!CHECK:    %[[I_PVT_REF:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    omp.wsloop   reduction(@[[RED_I32_NAME]] -> %[[XREF]] : !fir.ref<i32>, @[[RED_I64_NAME]] -> %[[YREF]] : !fir.ref<i64>) for  (%[[IVAL:.*]]) : i32
+!CHECK:      fir.store %[[IVAL]] to %[[I_PVT_REF]] : !fir.ref<i32>
+!CHECK:      %[[C1_32:.*]] = arith.constant 1 : i32
+!CHECK:      omp.reduction %[[C1_32]], %[[XREF]] : !fir.ref<i32>
+!CHECK:      %[[C1_64:.*]] = arith.constant 1 : i64
+!CHECK:      omp.reduction %[[C1_64]], %[[YREF]] : !fir.ref<i64>
+!CHECK:      omp.yield
+!CHECK:    omp.terminator
+!CHECK:  return
+
+subroutine multiple_reductions_
diff erent_type
+  integer :: x
+  integer(kind=8) :: y
+  !$omp parallel
+  !$omp do reduction(+:x,y)
+  do i=1, 100
+    x = x + 1_4
+    y = y + 1_8
+  end do
+  !$omp end do
+  !$omp end parallel
+end subroutine

diff  --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index b39328be4d618..7caf1cc99c988 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -1312,7 +1312,8 @@ def CancellationPointOp : OpenMP_Op<"cancellationpoint"> {
 // 2.19.5.7 declare reduction Directive
 //===----------------------------------------------------------------------===//
 
-def ReductionDeclareOp : OpenMP_Op<"reduction.declare", [Symbol]> {
+def ReductionDeclareOp : OpenMP_Op<"reduction.declare", [Symbol, 
+                                                         IsolatedFromAbove]> {
   let summary = "declares a reduction kind";
 
   let description = [{

diff  --git a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
index f4887d25f2177..7ac5c87765095 100644
--- a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
+++ b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
@@ -77,6 +77,21 @@ struct RegionLessOpWithVarOperandsConversion
     return success();
   }
 };
+
+struct ReductionOpConversion : public ConvertOpToLLVMPattern<omp::ReductionOp> {
+  using ConvertOpToLLVMPattern<omp::ReductionOp>::ConvertOpToLLVMPattern;
+  LogicalResult
+  matchAndRewrite(omp::ReductionOp curOp, OpAdaptor adaptor,
+                  ConversionPatternRewriter &rewriter) const override {
+    if (curOp.accumulator().getType().isa<MemRefType>()) {
+      // TODO: Support memref type in variable operands
+      return rewriter.notifyMatchFailure(curOp, "memref is not supported yet");
+    }
+    rewriter.replaceOpWithNewOp<omp::ReductionOp>(
+        curOp, TypeRange(), adaptor.getOperands(), curOp->getAttrs());
+    return success();
+  }
+};
 } // namespace
 
 void mlir::configureOpenMPToLLVMConversionLegality(
@@ -96,14 +111,19 @@ void mlir::configureOpenMPToLLVMConversionLegality(
             return typeConverter.isLegal(op->getOperandTypes()) &&
                    typeConverter.isLegal(op->getResultTypes());
           });
+  target.addDynamicallyLegalOp<mlir::omp::ReductionOp>([&](Operation *op) {
+    return typeConverter.isLegal(op->getOperandTypes());
+  });
 }
 
 void mlir::populateOpenMPToLLVMConversionPatterns(LLVMTypeConverter &converter,
                                                   RewritePatternSet &patterns) {
   patterns.add<
-      RegionOpConversion<omp::CriticalOp>, RegionOpConversion<omp::MasterOp>,
-      RegionOpConversion<omp::ParallelOp>, RegionOpConversion<omp::WsLoopOp>,
-      RegionOpConversion<omp::SectionsOp>, RegionOpConversion<omp::SingleOp>,
+      ReductionOpConversion, RegionOpConversion<omp::CriticalOp>,
+      RegionOpConversion<omp::MasterOp>, ReductionOpConversion,
+      RegionOpConversion<omp::MasterOp>, RegionOpConversion<omp::ParallelOp>,
+      RegionOpConversion<omp::WsLoopOp>, RegionOpConversion<omp::SectionsOp>,
+      RegionOpConversion<omp::SingleOp>,
       RegionLessOpWithVarOperandsConversion<omp::AtomicReadOp>,
       RegionLessOpWithVarOperandsConversion<omp::AtomicWriteOp>,
       RegionLessOpWithVarOperandsConversion<omp::FlushOp>,