[flang-commits] [flang] [mlir] [flang][OpenMP] Fix reduction init region block management (PR #122079)

Wed Jan 8 01:23:21 PST 2025

https://github.com/ergawy created https://github.com/llvm/llvm-project/pull/122079

## Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error.

## Possible solution - take 2:

This PR contains a more localized solution than https://github.com/llvm/llvm-project/pull/121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.

>From 8be8cec9226a90a63409b118e7e8b7aaf0c7e5f4 Mon Sep 17 00:00:00 2001
From: ergawy <kareem.ergawy at amd.com>
Date: Tue, 7 Jan 2025 08:19:16 -0600
Subject: [PATCH] [flang][OpenMP] Fix reduction init region block management

Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error.

Possible solution - take 2:

This PR contains a more localized solution than https://github.com/llvm/llvm-project/pull/121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.
---
 .../parallel-private-reduction-worstcase.f90  | 10 ++++-----
 .../Lower/OpenMP/parallel-reduction-mixed.f90 |  4 ++--
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp      | 19 ++++++++++-------
 .../openmp-parallel-reduction-multiblock.mlir | 20 +++++++++---------
 mlir/test/Target/LLVMIR/openmp-private.mlir   |  2 ++
 .../openmp-reduction-array-sections.mlir      | 21 +++++++++++--------
 .../LLVMIR/openmp-reduction-init-arg.mlir     |  8 +++----
 .../LLVMIR/openmp-reduction-sections.mlir     | 20 ++++++++++--------
 8 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
 ! CHECK-NEXT:    %{{.*}} = phi ptr
+! CHECK-NEXT:    br label %omp.par.region
+
+! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont13
 ! CHECK-NEXT:    br label %omp.reduction.init
 
-! CHECK:       omp.reduction.init:                               ; preds = %omp.region.cont13
+! CHECK:       omp.reduction.init:                               ; preds = %omp.par.region
 !                [deffered stores for results of reduction alloc regions]
 ! CHECK:         br label %[[VAL_96:.*]]
 
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region
-
-! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont21
 ! CHECK-NEXT:    br label %omp.par.region27
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.par.region
+! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
 ! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
 !CHECK:  %[[F_priv:.*]] = alloca ptr
 !CHECK:  %[[I_priv:.*]] = alloca i32
 
+!CHECK: omp.par.region:
+
 !CHECK: omp.reduction.init:
 !CHECK:  store ptr %{{.*}}, ptr %[[F_priv]]
 !CHECK:  store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
 !CHECK:  br label %[[MALLOC_BB:.*]]
 
 !CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());
   llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
   auto allocaIP = llvm::IRBuilderBase::InsertPoint(
       latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     }
   }
 
-  builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+  if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+    builder.SetInsertPoint(initBlock);
+  else
+    builder.SetInsertPoint(initBlock->getTerminator());
 
   // store result of the alloc region to the allocated pointer to the real
   // reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     assert(phis.size() == 1 && "expected one value to be yielded from the "
                                "reduction neutral element declaration region");
 
-    builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+    if (builder.GetInsertBlock()->empty() ||
+        builder.GetInsertBlock()->getTerminator() == nullptr)
+      builder.SetInsertPoint(builder.GetInsertBlock());
+    else
+      builder.SetInsertPoint(
+          builder.GetInsertBlock()->getTerminator());
 
     if (isByRef[i]) {
       if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
   SmallVector<DeferredStore> deferredStores;
 
   if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
       return llvm::make_error<PreviouslyReportedError>();
 
     assert(afterAllocas.get()->getSinglePredecessor());
+    builder.restoreIP(codeGenIP);
+
     if (failed(
             initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
                               afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         moduleTranslation, allocaIP);
 
     // ParallelOp has only one region associated with it.
-    builder.restoreIP(codeGenIP);
     llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
         opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
     if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         %[[VAL_20:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK:         br label %[[VAL_23:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_24:.*]]
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.reduction.neutral:                            ; preds = %[[VAL_23]]
+// CHECK:         br label %[[VAL_23:omp.par.region]]
+// CHECK:       [[VAL_23]]:                                   ; preds = %[[PAR_ENTRY]]
+// CHECK:         br label %[[VAL_42:.*]]
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK:       [[VAL_25]]:                            ; preds = %[[RED_INIT]]
 // CHECK:         %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
 // CHECK:         %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
 // CHECK:         br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_38:.*]]
 // CHECK:       omp.reduction.neutral8:                           ; preds = %[[VAL_36]], %[[VAL_37]]
 // CHECK:         br label %[[VAL_39:.*]]
-// CHECK:       omp.region.cont4:                                 ; preds = %[[VAL_38]]
+// CHECK:       [[VAL_39]]:                                 ; preds = %[[VAL_38]]
 // CHECK:         %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
 // CHECK:         store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
 // CHECK:         br label %[[VAL_41:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_39]]
-// CHECK:         br label %[[VAL_42:.*]]
-// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_41]]
+// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_39]]
 // CHECK:         br label %[[VAL_43:.*]]
-// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_42]]
+// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_41]]
 // CHECK:         %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
 // CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
 // CHECK:         %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
+// CHECK:           br label %omp.reduction.init
+// CHECK:         omp.reduction.init:
 // CHECK:           br label %[[PAR_REG_BEG:.*]]
 // CHECK:         [[PAR_REG_BEG]]:
 // CHECK-NEXT:      %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 }
 
 // CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_7:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_14:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_15:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
-// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK:         br label %[[VAL_17:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_15]]
+
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_18:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_17]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
 // CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_18]]
+
+// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
+// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK:         br label %[[VAL_17:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_22]]
 // CHECK:         store i32 0, ptr %[[VAL_7]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_8]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
 // CHECK:         %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
 // CHECK:         br label %[[VAL_28:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
 // CHECK:         br label %[[VAL_32:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_28]]
 // CHECK:         %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_25:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[INIT_LABEL:.*]]
 // CHECK: [[INIT_LABEL]]:
 // CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
 // CHECK:         store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
 // CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
 // CHECK:         store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_26:.*]]
 // CHECK:         br label %[[VAL_27:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_25]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[INIT_LABEL]]
 // CHECK:         br label %[[VAL_28:.*]]
 // CHECK:       omp.region.cont:                                  ; preds = %[[VAL_27]]
 // CHECK:         %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 }
 
 // CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
 // CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_20:.*]] = alloca float, align 4
 // CHECK:         %[[VAL_21:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_23:.*]]
-// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK:         br label %[[VAL_24:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_22]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_24]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_22]]
 // CHECK:         br label %[[VAL_26:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_25]]
+
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK:         br label %[[VAL_24:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[RED_INIT]]
 // CHECK:         store i32 0, ptr %[[VAL_13]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
 // CHECK:         %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
 // CHECK:         br label %[[VAL_32:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
 // CHECK:         br label %[[VAL_36:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_32]]
 // CHECK:         %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]