[clang] [llvm] [mlir] [Flang][OpenMP] Enable no-loop kernels (PR #155818)

Tue Sep 2 03:23:44 PDT 2025

================
@@ -2590,13 +2590,27 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
   }
 
   builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin());
+
+  bool noLoopMode = false;
+  omp::TargetOp targetOp = wsloopOp->getParentOfType<mlir::omp::TargetOp>();
+  if (targetOp) {
+    Operation *targetCapturedOp = targetOp.getInnermostCapturedOmpOp();
----------------
DominikAdamski wrote:

Hi Sergio,
thanks for your feedback:

> Should we check here that the captured op is the omp.loop_nest wrapped by this omp.wsloop?

Yes. There is  assumption in `TargetRegionFlags TargetOp::getKernelExecFlags` which checks if the argument is is the result of calling `getInnermostCapturedOmpOp()`.

>I'm not actually sure about what is the expected behavior of this, but I imagine that no-loop would just refer to the outer loop, as it's the one for which the trip count can be evaluated in the host.

There are 2 issues:

1. Yes, your guess is right. The no-loop should refer only to the outer loop. I modified the OMPIRBuilder code to generate no-loop code only for distribute for static loop: https://github.com/llvm/llvm-project/pull/155818/commits/564410d9930b9b838d4dadfbea566c222c265e87#diff-a6c8db9d350ec59f4eb93f27f29468b01c9590426a11c9cb79e499bc96b28adc 
2. Currently Flang does not generate valid code for OpenMP kernel:
```
!$omp target teams distribute parallel do
do i = 0,15
 !$omp parallel do
    do j = 1, 64
      array(i* 64 + j) = i *64 + j
    end do
 !$omp end parallel do
end do
```

https://github.com/llvm/llvm-project/pull/155818