[flang-commits] [flang] [flang] Restrict O0 hlfir.assign scalar-to-array inlining to OpenMP target device (PR #201774)
Sairudra More via flang-commits
flang-commits at lists.llvm.org
Fri Jun 5 01:06:10 PDT 2026
https://github.com/Saieiei created https://github.com/llvm/llvm-project/pull/201774
PR #197092 enabled `InlineHLFIRAssign{onlyScalarRHS=true}` at `-O0` to prevent `_FortranAAssign` (which uses `malloc`/`free`) from appearing in OpenMP target device code generated at `-O0`. However, running the pass for all `-O0` host compilations caused a debug regression: a line breakpoint on a scalar-to-array broadcast such as `arr = 11` now hits once per array element instead of once, because the assignment is expanded into an inline loop.
This patch restricts the `-O0` scheduling of `InlineHLFIRAssign{onlyScalarRHS=true}` to OpenMP target-device compilations only. Host `-O0` falls back to `_FortranAAssign` and the single-breakpoint-hit debug behavior is restored.
`MLIRToLLVMPassPipelineConfig` gains an `EnableOpenMPIsTargetDevice` bool (alongside the existing `EnableOpenMP` / `EnableOpenMPSimd` flags); both the flang frontend and `bbc` set it from `LangOpts.OpenMPIsTargetDevice`.
Two new regression tests are added:
- `flang/test/Lower/HLFIR/scalar-to-array-assign-host-O0.f90`: verifies host `-O0` still emits `_FortranAAssign`.
- `flang/test/Lower/OpenMP/scalar-to-array-assign-target-device-O0.f90`: verifies device `-O0` inside `omp.target` still inlines the broadcast loop.
Fixing the per-element debug locations on the device-side inlined loop is a separate concern and is left as follow-up work.
>From 17910682d4b140b4a49daa2628a7b65500e4b110 Mon Sep 17 00:00:00 2001
From: Sairudra More <sairudra60 at gmail.com>
Date: Thu, 4 Jun 2026 23:23:16 -0500
Subject: [PATCH] [flang] Restrict O0 scalar assign inlining to device code
PR #197092 added O0 scalar-to-array assignment inlining to avoid _FortranAAssign in OpenMP target device code.
That also changed normal host -g -O0 debugging: a breakpoint on a scalar broadcast such as 'arr = 11' could be hit once per array element.
Restrict the O0 scalar-RHS-only path to OpenMP target-device compilation, keeping host O0 on the existing runtime-call path.
---
flang/include/flang/Tools/CrossToolHelpers.h | 2 +
flang/lib/Frontend/FrontendActions.cpp | 5 +
flang/lib/Optimizer/Passes/Pipelines.cpp | 12 +-
.../test/Driver/mlir-debug-pass-pipeline.f90 | 5 -
flang/test/Driver/mlir-pass-pipeline.f90 | 10 +-
.../parallel-private-reduction-worstcase.f90 | 128 ++++++------------
.../Integration/OpenMP/private-global.f90 | 39 ++++--
flang/test/Integration/prefetch.f90 | 1 +
.../HLFIR/scalar-to-array-assign-host-O0.f90 | 17 +++
...calar-to-array-assign-target-device-O0.f90 | 18 +++
...workdistribute-saxpy-and-scalar-assign.f90 | 2 +-
.../OpenMP/workdistribute-scalar-assign.f90 | 2 +-
flang/tools/bbc/bbc.cpp | 2 +
13 files changed, 128 insertions(+), 115 deletions(-)
create mode 100644 flang/test/Lower/HLFIR/scalar-to-array-assign-host-O0.f90
create mode 100644 flang/test/Lower/OpenMP/scalar-to-array-assign-target-device-O0.f90
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h
index 6240354bd899a..90e159cc157bf 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -141,6 +141,8 @@ struct MLIRToLLVMPassPipelineConfig : public FlangEPCallBacks {
///< functions.
bool NSWOnLoopVarInc = true; ///< Add nsw flag to loop variable increments.
bool EnableOpenMP = false; ///< Enable OpenMP lowering.
+ bool EnableOpenMPIsTargetDevice =
+ false; ///< Compiling for an OpenMP target device.
bool UseSampleProfile = false; ///< Enable sample based profiling
bool DebugInfoForProfiling = false; ///< Enable extra debugging info
bool EnableOpenMPSimd = false; ///< Enable OpenMP simd-only mode.
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 0d154a7157867..66602ed52f6cd 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -633,6 +633,8 @@ void CodeGenAction::lowerHLFIRToFIR() {
MLIRToLLVMPassPipelineConfig config(level);
config.fpMaxminBehavior =
ci.getInvocation().getLoweringOpts().getFPMaxminBehavior();
+ if (ci.getInvocation().getLangOpts().OpenMPIsTargetDevice)
+ config.EnableOpenMPIsTargetDevice = true;
// Create the pass pipeline
fir::createHLFIRToFIRPassPipeline(pm, enableOpenMP, config);
(void)mlir::applyPassManagerCLOptions(pm);
@@ -763,6 +765,9 @@ void CodeGenAction::generateLLVMIR() {
Fortran::common::LanguageFeature::OpenMP))
config.EnableOpenMP = true;
+ if (ci.getInvocation().getLangOpts().OpenMPIsTargetDevice)
+ config.EnableOpenMPIsTargetDevice = true;
+
if (ci.getInvocation().getLangOpts().OpenMPSimd)
config.EnableOpenMPSimd = true;
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 682e3e48e0a22..8e8521391885e 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -313,10 +313,14 @@ void createHLFIRToFIRPassPipeline(mlir::PassManager &pm,
addNestedPassToAllTopLevelOperations<PassConstructor>(
pm, hlfir::createInlineHLFIRCopyIn);
}
- } else {
- // At O0, only inline scalar-to-array broadcasts. This avoids emitting
- // Fortran runtime calls (e.g. _FortranAAssign) that use malloc/free in
- // device code generated by OpenMP target offloading.
+ } else if (config.EnableOpenMPIsTargetDevice) {
+ // At O0, only inline scalar-to-array broadcasts when compiling for an
+ // OpenMP target device. This avoids emitting Fortran runtime calls
+ // (e.g. _FortranAAssign) that use malloc/free in device code generated
+ // by OpenMP target offloading. Restricting this to target-device
+ // compilation preserves the runtime call on the host at -O0 so that a
+ // line breakpoint on a scalar-to-array assignment hits once instead of
+ // once per element.
addNestedPassToAllTopLevelOperations(pm, [&]() {
return hlfir::createInlineHLFIRAssign({/*onlyScalarRHS=*/true});
});
diff --git a/flang/test/Driver/mlir-debug-pass-pipeline.f90 b/flang/test/Driver/mlir-debug-pass-pipeline.f90
index c5e63fdbd9d2b..d5126012b6957 100644
--- a/flang/test/Driver/mlir-debug-pass-pipeline.f90
+++ b/flang/test/Driver/mlir-debug-pass-pipeline.f90
@@ -32,23 +32,18 @@
! ALL-NEXT: 'fir.global' Pipeline
! ALL-NEXT: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! ALL-NEXT: InlineHLFIRAssign
! ALL-NEXT: 'func.func' Pipeline
! ALL-NEXT: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! ALL-NEXT: InlineHLFIRAssign
! ALL-NEXT: 'omp.declare_mapper' Pipeline
! ALL-NEXT: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! ALL-NEXT: InlineHLFIRAssign
! ALL-NEXT: 'omp.declare_reduction' Pipeline
! ALL-NEXT: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! ALL-NEXT: InlineHLFIRAssign
! ALL-NEXT: 'omp.private' Pipeline
! ALL-NEXT: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! ALL-NEXT: InlineHLFIRAssign
! ALL-NEXT: LowerHLFIROrderedAssignments
! ALL-NEXT: LowerHLFIRIntrinsics
! ALL-NEXT: BufferizeHLFIR
diff --git a/flang/test/Driver/mlir-pass-pipeline.f90 b/flang/test/Driver/mlir-pass-pipeline.f90
index a7ea0a9de4867..b679564adff10 100644
--- a/flang/test/Driver/mlir-pass-pipeline.f90
+++ b/flang/test/Driver/mlir-pass-pipeline.f90
@@ -9,6 +9,11 @@
end program
+! At -O0 on the host (no OpenMP target-device compilation), InlineHLFIRAssign
+! is no longer scheduled. See PR #197092 follow-up restricting the -O0 pass
+! to OpenMP target-device compilation.
+! O0-NOT: InlineHLFIRAssign
+
! ALL: Pass statistics report
! ALL: Fortran::lower::VerifierPass
@@ -32,27 +37,22 @@
! O2-NEXT: SimplifyHLFIRIntrinsics
! ALL: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! O0-NEXT: InlineHLFIRAssign
! ALL-NEXT:'func.func' Pipeline
! O2-NEXT: SimplifyHLFIRIntrinsics
! ALL: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! O0-NEXT: InlineHLFIRAssign
! ALL-NEXT:'omp.declare_mapper' Pipeline
! O2-NEXT: SimplifyHLFIRIntrinsics
! ALL: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! O0-NEXT: InlineHLFIRAssign
! ALL-NEXT:'omp.declare_reduction' Pipeline
! O2-NEXT: SimplifyHLFIRIntrinsics
! ALL: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! O0-NEXT: InlineHLFIRAssign
! ALL-NEXT:'omp.private' Pipeline
! O2-NEXT: SimplifyHLFIRIntrinsics
! ALL: InlineElementals
! ALL-NEXT: SeparateAllocatableAssign
-! O0-NEXT: InlineHLFIRAssign
! O2-NEXT: Canonicalizer
! O2-NEXT: CSE
! O2-NEXT: (S) {{.*}} num-cse'd
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index c4688a6e8a192..c6a46691d58f5 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -50,7 +50,7 @@ subroutine worst_case(a, b, c, d)
! CHECK: br i1 %{{.*}}, label %omp.private.init3, label %omp.private.init4
! CHECK: omp.private.init4: ; preds = %omp.private.init2
-! [finish private alloc for first var with zero extent]
+! [finish private alloc for second var with zero extent]
! CHECK: br label %omp.private.init5
! CHECK: omp.private.init5: ; preds = %omp.private.init3, %omp.private.init4
@@ -61,13 +61,13 @@ subroutine worst_case(a, b, c, d)
! CHECK-NEXT: br label %omp.private.init7
! CHECK: omp.private.init7:
-! [begin private alloc for second var]
+! [begin private alloc for first var]
! [read the length from the mold argument]
! [if it is non-zero...]
! CHECK: br i1 {{.*}}, label %omp.private.init8, label %omp.private.init9
! CHECK: omp.private.init9: ; preds = %omp.private.init7
-! [finish private alloc for second var with zero extent]
+! [finish private alloc for first var with zero extent]
! CHECK: br label %omp.private.init10
! CHECK: omp.private.init10: ; preds = %omp.private.init8, %omp.private.init9
@@ -105,64 +105,50 @@ subroutine worst_case(a, b, c, d)
! CHECK-NEXT: br label %omp.reduction.init
! CHECK: omp.reduction.init: ; preds = %omp.region.cont15
-! [deferred stores for results of reduction alloc regions]
+! [deffered stores for results of reduction alloc regions]
! CHECK: br label %[[VAL_96:.*]]
! CHECK: omp.reduction.neutral: ; preds = %omp.reduction.init
-! [start of reduction initialization region for first var]
+! [start of reduction initialization region]
! [null check:]
! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral20, label %omp.reduction.neutral21
! CHECK: omp.reduction.neutral21: ; preds = %omp.reduction.neutral
-! [malloc the reduction variable]
+! [malloc and assign the default value to the reduction variable]
! CHECK: br label %omp.reduction.neutral22
-! CHECK: omp.reduction.neutral22: ; preds = %omp.reduction.neutral23, %omp.reduction.neutral21
-! [inlined scalar-to-array init loop header]
-! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral23, label %omp.reduction.neutral24
-
-! CHECK: omp.reduction.neutral24: ; preds = %omp.reduction.neutral22
-! CHECK: br label %omp.reduction.neutral25
-
-! CHECK: omp.reduction.neutral25: ; preds = %omp.reduction.neutral20, %omp.reduction.neutral24
+! CHECK: omp.reduction.neutral22: ; preds = %omp.reduction.neutral20, %omp.reduction.neutral21
! CHECK-NEXT: br label %omp.region.cont19
-! CHECK: omp.region.cont19: ; preds = %omp.reduction.neutral25
+! CHECK: omp.region.cont19: ; preds = %omp.reduction.neutral22
! CHECK-NEXT: %{{.*}} = phi ptr
-! CHECK-NEXT: br label %omp.reduction.neutral27
+! CHECK-NEXT: br label %omp.reduction.neutral24
-! CHECK: omp.reduction.neutral27: ; preds = %omp.region.cont19
-! [start of reduction initialization region for second var]
+! CHECK: omp.reduction.neutral24: ; preds = %omp.region.cont19
+! [start of reduction initialization region]
! [null check:]
-! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral28, label %omp.reduction.neutral29
-
-! CHECK: omp.reduction.neutral29: ; preds = %omp.reduction.neutral27
-! [malloc the reduction variable]
-! CHECK: br label %omp.reduction.neutral30
-
-! CHECK: omp.reduction.neutral30: ; preds = %omp.reduction.neutral31, %omp.reduction.neutral29
-! [inlined scalar-to-array init loop header]
-! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral31, label %omp.reduction.neutral32
+! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral25, label %omp.reduction.neutral26
-! CHECK: omp.reduction.neutral32: ; preds = %omp.reduction.neutral30
-! CHECK: br label %omp.reduction.neutral33
+! CHECK: omp.reduction.neutral26: ; preds = %omp.reduction.neutral24
+! [malloc and assign the default value to the reduction variable]
+! CHECK: br label %omp.reduction.neutral27
-! CHECK: omp.reduction.neutral33: ; preds = %omp.reduction.neutral28, %omp.reduction.neutral32
-! CHECK-NEXT: br label %omp.region.cont26
+! CHECK: omp.reduction.neutral27: ; preds = %omp.reduction.neutral25, %omp.reduction.neutral26
+! CHECK-NEXT: br label %omp.region.cont23
-! CHECK: omp.region.cont26: ; preds = %omp.reduction.neutral33
+! CHECK: omp.region.cont23: ; preds = %omp.reduction.neutral27
! CHECK-NEXT: %{{.*}} = phi ptr
-! CHECK-NEXT: br label %omp.par.region35
+! CHECK-NEXT: br label %omp.par.region29
-! CHECK: omp.par.region35: ; preds = %omp.region.cont26
+! CHECK: omp.par.region29: ; preds = %omp.region.cont23
! [call SUM runtime function]
! [if (sum(a) == 1)]
-! CHECK: br i1 %{{.*}}, label %omp.par.region36, label %omp.par.region37
+! CHECK: br i1 %{{.*}}, label %omp.par.region30, label %omp.par.region31
-! CHECK: omp.par.region37: ; preds = %omp.par.region35
-! CHECK-NEXT: br label %omp.region.cont34
+! CHECK: omp.par.region31: ; preds = %omp.par.region29
+! CHECK-NEXT: br label %omp.region.cont28
-! CHECK: omp.region.cont34: ; preds = %omp.par.region36, %omp.par.region37
+! CHECK: omp.region.cont28: ; preds = %omp.par.region30, %omp.par.region31
! [omp parallel region done, call into the runtime to complete reduction]
! CHECK: %[[VAL_233:.*]] = call i32 @__kmpc_reduce(
! CHECK: switch i32 %[[VAL_233]], label %reduce.finalize [
@@ -170,16 +156,16 @@ subroutine worst_case(a, b, c, d)
! CHECK-NEXT: i32 2, label %reduce.switch.atomic
! CHECK-NEXT: ]
-! CHECK: reduce.switch.atomic: ; preds = %omp.region.cont34
+! CHECK: reduce.switch.atomic: ; preds = %omp.region.cont28
! CHECK-NEXT: unreachable
-! CHECK: reduce.switch.nonatomic: ; preds = %omp.region.cont34
+! CHECK: reduce.switch.nonatomic: ; preds = %omp.region.cont28
! CHECK-NEXT: %[[red_private_value_0:.*]] = load ptr, ptr %{{.*}}, align 8
! CHECK-NEXT: br label %omp.reduction.nonatomic.body
! [various blocks implementing the reduction]
-! CHECK: omp.region.cont42: ; preds =
+! CHECK: omp.region.cont36: ; preds =
! CHECK-NEXT: %{{.*}} = phi ptr
! CHECK-NEXT: call void @__kmpc_end_reduce(
! CHECK-NEXT: br label %reduce.finalize
@@ -196,59 +182,29 @@ subroutine worst_case(a, b, c, d)
! CHECK: omp.reduction.cleanup: ; preds = %.fini
! [null check]
-! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup48, label %omp.reduction.cleanup49
+! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup42, label %omp.reduction.cleanup43
-! CHECK: omp.reduction.cleanup49: ; preds = %omp.reduction.cleanup48, %omp.reduction.cleanup
-! CHECK-NEXT: br label %omp.region.cont47
+! CHECK: omp.reduction.cleanup43: ; preds = %omp.reduction.cleanup42, %omp.reduction.cleanup
+! CHECK-NEXT: br label %omp.region.cont41
-! CHECK: omp.region.cont47: ; preds = %omp.reduction.cleanup49
-! CHECK: br label %omp.reduction.cleanup51
+! CHECK: omp.region.cont41: ; preds = %omp.reduction.cleanup43
+! CHECK-NEXT: %{{.*}} = load ptr, ptr
+! CHECK-NEXT: br label %omp.reduction.cleanup45
-! CHECK: omp.reduction.cleanup51: ; preds = %omp.region.cont47
+! CHECK: omp.reduction.cleanup45: ; preds = %omp.region.cont41
! [null check]
-! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup52, label %omp.reduction.cleanup53
-
-! CHECK: omp.reduction.cleanup53: ; preds = %omp.reduction.cleanup52, %omp.reduction.cleanup51
-! CHECK-NEXT: br label %omp.region.cont50
+! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup46, label %omp.reduction.cleanup47
-! CHECK: omp.region.cont50: ; preds = %omp.reduction.cleanup53
-! CHECK-NEXT: br label %omp.private.dealloc
-
-! CHECK: omp.private.dealloc: ; preds = %omp.region.cont50
-! [null check for first private var dealloc]
-! CHECK: br i1 %{{.*}}, label %omp.private.dealloc55, label %omp.private.dealloc56
-
-! CHECK: omp.private.dealloc56: ; preds = %omp.private.dealloc55, %omp.private.dealloc
-! CHECK-NEXT: br label %omp.region.cont54
-
-! CHECK: omp.region.cont54: ; preds = %omp.private.dealloc56
-! CHECK-NEXT: br label %omp.private.dealloc58
-
-! CHECK: omp.private.dealloc58: ; preds = %omp.region.cont54
-! [null check for second private var dealloc]
-! CHECK: br i1 %{{.*}}, label %omp.private.dealloc59, label %omp.private.dealloc60
-
-! CHECK: omp.private.dealloc60: ; preds = %omp.private.dealloc59, %omp.private.dealloc58
-! CHECK-NEXT: br label %omp.region.cont57
-
-! CHECK: omp.par.region36: ; preds = %omp.par.region35
+! CHECK: omp.par.region30: ; preds = %omp.par.region29
! CHECK-NEXT: call void @_FortranAStopStatement
-! CHECK: omp.reduction.neutral31: ; preds = %omp.reduction.neutral30
-! [inlined init loop body for second var]
-! CHECK: br label %omp.reduction.neutral30
-
-! CHECK: omp.reduction.neutral28: ; preds = %omp.reduction.neutral27
-! [source length was zero: finish initializing second var]
-! CHECK: br label %omp.reduction.neutral33
-
-! CHECK: omp.reduction.neutral23: ; preds = %omp.reduction.neutral22
-! [inlined init loop body for first var]
-! CHECK: br label %omp.reduction.neutral22
+! CHECK: omp.reduction.neutral25: ; preds = %omp.reduction.neutral24
+! [source length was zero: finish initializing array]
+! CHECK: br label %omp.reduction.neutral27
! CHECK: omp.reduction.neutral20: ; preds = %omp.reduction.neutral
-! [source length was zero: finish initializing first var]
-! CHECK: br label %omp.reduction.neutral25
+! [source length was zero: finish initializing array]
+! CHECK: br label %omp.reduction.neutral22
! CHECK: omp.private.copy17: ; preds = %omp.private.copy16
! [source length was non-zero: call assign runtime]
@@ -266,5 +222,5 @@ subroutine worst_case(a, b, c, d)
! [var extent was non-zero: malloc a private array]
! CHECK: br label %omp.private.init5
-! CHECK: omp.par.exit.exitStub: ; preds = %omp.region.cont57
+! CHECK: omp.par.exit.exitStub: ; preds = %omp.region.cont51
! CHECK-NEXT: ret void
diff --git a/flang/test/Integration/OpenMP/private-global.f90 b/flang/test/Integration/OpenMP/private-global.f90
index 4b27e6ddc79a4..ed11a95c4aeb1 100644
--- a/flang/test/Integration/OpenMP/private-global.f90
+++ b/flang/test/Integration/OpenMP/private-global.f90
@@ -17,21 +17,34 @@ program bug
! CHECK-LABEL: define internal void {{.*}}..omp_par(
! CHECK: omp.par.entry:
+! CHECK: %[[VAL_9:.*]] = alloca i32, align 4
+! CHECK: %[[VAL_10:.*]] = load i32, ptr %[[VAL_11:.*]], align 4
+! CHECK: store i32 %[[VAL_10]], ptr %[[VAL_9]], align 4
+! CHECK: %[[VAL_12:.*]] = load i32, ptr %[[VAL_9]], align 4
! CHECK: %[[PRIV_BOX_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
+! CHECK: %[[ELEMENTAL_TMP:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
+! CHECK: %[[ELEMENTAL_TMP_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
+! CHECK: %[[TABLE_BOX_ADDR:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
+! CHECK: %[[BOXED_FIFTY:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, align 8
+! CHECK: %[[FIFTY:.*]] = alloca i32, i64 1, align 4
+! CHECK: %[[INTERMEDIATE:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
+! CHECK: %[[TABLE_BOX_ADDR2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
! ...
-! check that the private copy is allocated via malloc
-! CHECK: omp.private.init:
-! CHECK: %[[PRIV_TABLE:.*]] = call ptr @malloc(i64 40)
-! ...
-! check that we use the private copy of table for the assignment (table = 50)
-! The assignment is now inlined as a loop instead of calling _FortranAAssign.
+! check that we use the private copy of table for the assignment
! CHECK: omp.par.region1:
-! CHECK: call void @llvm.memcpy.p0.p0.i32(ptr{{.*}}%[[BOX_COPY:.*]], ptr{{.*}}%[[PRIV_BOX_ALLOC]], i32 48, i1 false)
-! ...
-! check that we use the private copy of table for table/=50 (inlined loop body)
-! CHECK: omp.par.region6:
-! CHECK: %[[VAL_44:.*]] = sub {{.*}} i64 %{{.*}}, 1
+! CHECK: call void @llvm.memcpy.p0.p0.i32(ptr{{.*}}%[[INTERMEDIATE]], ptr{{.*}}%[[PRIV_BOX_ALLOC]], i32 {{4[48]}}, i1 false)
+! CHECK: store i32 50, ptr %[[FIFTY]], align 4
+! CHECK: %[[FIFTY_BOX_VAL:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 } { ptr undef, i64 4, i32 20240719, i8 0, i8 9, i8 0, i8 0 }, ptr %[[FIFTY]], 0
+! CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[FIFTY_BOX_VAL]], ptr %[[BOXED_FIFTY]], align {{[48]}}
+! CHECK: call void @llvm.memcpy.p0.p0.i32(ptr %[[TABLE_BOX_ADDR2]], ptr %[[INTERMEDIATE]], i32 {{4[48]}}, i1 false)
+! CHECK: call void @_FortranAAssign(ptr %[[TABLE_BOX_ADDR2]], ptr %[[BOXED_FIFTY]], ptr @{{.*}}, i32 9)
+! CHECK: call void @llvm.memcpy.p0.p0.i32(ptr{{.*}}%[[TABLE_BOX_ADDR]], ptr{{.*}}%[[PRIV_BOX_ALLOC]], i32 {{4[48]}}, i1 false)
+! CHECK: %[[PRIV_TABLE:.*]] = call ptr @malloc(i{{(32)|(64)}} 40)
! ...
-! check that we store 50 into the private table's elements (inlined loop body)
+! check that we use the private copy of table for table/=50
! CHECK: omp.par.region3:
-! CHECK: store i32 50, ptr %{{.*}}, align 4
+! CHECK: %[[VAL_44:.*]] = sub nuw nsw i64 %{{.*}}, 1
+! CHECK: %[[VAL_45:.*]] = mul nuw nsw i64 %[[VAL_44]], 1
+! CHECK: %[[VAL_46:.*]] = mul nuw nsw i64 %[[VAL_45]], 1
+! CHECK: %[[VAL_47:.*]] = add nuw nsw i64 %[[VAL_46]], 0
+! CHECK: %[[VAL_48:.*]] = getelementptr nusw nuw i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
diff --git a/flang/test/Integration/prefetch.f90 b/flang/test/Integration/prefetch.f90
index 76227caf02b43..c015b6736972a 100644
--- a/flang/test/Integration/prefetch.f90
+++ b/flang/test/Integration/prefetch.f90
@@ -13,6 +13,7 @@
!===============================================================================
subroutine test_prefetch_01()
+ ! LLVM: {{.*}} = alloca i32, i64 1, align 4
! LLVM: %[[VAR_J:.*]] = alloca i32, i64 1, align 4
! LLVM: %[[VAR_I:.*]] = alloca i32, i64 1, align 4
! LLVM: %[[VAR_A:.*]] = alloca [256 x i32], i64 1, align 4
diff --git a/flang/test/Lower/HLFIR/scalar-to-array-assign-host-O0.f90 b/flang/test/Lower/HLFIR/scalar-to-array-assign-host-O0.f90
new file mode 100644
index 0000000000000..88d4344da6f2b
--- /dev/null
+++ b/flang/test/Lower/HLFIR/scalar-to-array-assign-host-O0.f90
@@ -0,0 +1,17 @@
+! Regression test for the follow-up to PR llvm/llvm-project#197092.
+!
+! At -O0 on the host (no OpenMP target-device compilation), a scalar-to-array
+! broadcast assignment must lower to a Fortran runtime call
+! (_FortranAAssign), not to an inline assignment loop. Lowering it inline
+! at -O0 caused -g line breakpoints to hit once per array element instead
+! of once.
+
+! RUN: %flang_fc1 -emit-fir -O0 %s -o - | FileCheck %s
+
+! CHECK-LABEL: func @_QPhost_scalar_broadcast
+subroutine host_scalar_broadcast(arr)
+ integer :: arr(4)
+ ! CHECK: fir.call @_FortranAAssign
+ ! CHECK-NOT: fir.do_loop
+ arr = 11
+end subroutine
diff --git a/flang/test/Lower/OpenMP/scalar-to-array-assign-target-device-O0.f90 b/flang/test/Lower/OpenMP/scalar-to-array-assign-target-device-O0.f90
new file mode 100644
index 0000000000000..db019a6a15ab1
--- /dev/null
+++ b/flang/test/Lower/OpenMP/scalar-to-array-assign-target-device-O0.f90
@@ -0,0 +1,18 @@
+! Regression test for PR llvm/llvm-project#197092 and its follow-up.
+!
+! When compiling for an OpenMP target device at -O0, a scalar-to-array
+! broadcast assignment inside a target region must still be inlined to
+! avoid emitting a _FortranAAssign runtime call (which internally uses
+! malloc/free) into GPU device code.
+
+! RUN: %flang_fc1 -emit-fir -O0 -fopenmp -fopenmp-is-target-device %s -o - \
+! RUN: | FileCheck %s --implicit-check-not="fir.call @_FortranAAssign"
+
+subroutine device_scalar_broadcast()
+ integer :: arr(4)
+ !$omp target map(tofrom: arr)
+ ! CHECK: omp.target
+ ! CHECK: fir.do_loop
+ arr = 11
+ !$omp end target
+end subroutine
diff --git a/flang/test/Lower/OpenMP/workdistribute-saxpy-and-scalar-assign.f90 b/flang/test/Lower/OpenMP/workdistribute-saxpy-and-scalar-assign.f90
index cbb4dfc3cdadc..3824847a7bcda 100644
--- a/flang/test/Lower/OpenMP/workdistribute-saxpy-and-scalar-assign.f90
+++ b/flang/test/Lower/OpenMP/workdistribute-saxpy-and-scalar-assign.f90
@@ -1,4 +1,4 @@
-! RUN: %flang_fc1 -emit-fir -fopenmp -fopenmp-version=60 %s -o - | FileCheck %s
+! RUN: %flang_fc1 -emit-fir -O1 -fopenmp -fopenmp-version=60 %s -o - | FileCheck %s
! CHECK-LABEL: func @_QPtarget_teams_workdistribute
subroutine target_teams_workdistribute()
diff --git a/flang/test/Lower/OpenMP/workdistribute-scalar-assign.f90 b/flang/test/Lower/OpenMP/workdistribute-scalar-assign.f90
index 217df8fb05176..3d7ef7abf6816 100644
--- a/flang/test/Lower/OpenMP/workdistribute-scalar-assign.f90
+++ b/flang/test/Lower/OpenMP/workdistribute-scalar-assign.f90
@@ -1,4 +1,4 @@
-! RUN: %flang_fc1 -emit-fir -fopenmp -fopenmp-version=60 %s -o - | FileCheck %s --implicit-check-not="fir.call @_FortranAAssign"
+! RUN: %flang_fc1 -emit-fir -O1 -fopenmp -fopenmp-version=60 %s -o - | FileCheck %s --implicit-check-not="fir.call @_FortranAAssign"
! CHECK-LABEL: func @_QPtarget_teams_workdistribute_scalar_assign
subroutine target_teams_workdistribute_scalar_assign()
diff --git a/flang/tools/bbc/bbc.cpp b/flang/tools/bbc/bbc.cpp
index 30b4a99c8f2d5..23e7af238198f 100644
--- a/flang/tools/bbc/bbc.cpp
+++ b/flang/tools/bbc/bbc.cpp
@@ -576,6 +576,8 @@ static llvm::LogicalResult convertFortranSourceToMLIR(
config.SkipConvertComplexPow = targetMachine.getTargetTriple().isAMDGCN();
if (enableOpenMP)
config.EnableOpenMP = true;
+ if (enableOpenMPDevice)
+ config.EnableOpenMPIsTargetDevice = true;
config.NSWOnLoopVarInc = !integerWrapAround;
fir::registerDefaultInlinerPass(config);
fir::createDefaultFIROptimizerPassPipeline(pm, config);
More information about the flang-commits
mailing list