[flang-commits] [flang] [flang][MLIR] Hoist `do concurrent` nest bounds/steps outside the nest (PR #114020)

Tue Oct 29 07:40:58 PDT 2024

================
@@ -2131,18 +2131,33 @@ class FirConverter : public Fortran::lower::AbstractConverter {
       llvm::SmallVectorImpl<const Fortran::parser::CompilerDirective *> &dirs) {
     assert(!incrementLoopNestInfo.empty() && "empty loop nest");
     mlir::Location loc = toLocation();
+
     for (IncrementLoopInfo &info : incrementLoopNestInfo) {
-      info.loopVariable =
-          genLoopVariableAddress(loc, *info.loopVariableSym, info.isUnordered);
-      mlir::Value lowerValue = genControlValue(info.lowerExpr, info);
-      mlir::Value upperValue = genControlValue(info.upperExpr, info);
-      bool isConst = true;
-      mlir::Value stepValue = genControlValue(
-          info.stepExpr, info, info.isStructured() ? nullptr : &isConst);
-      // Use a temp variable for unstructured loops with non-const step.
-      if (!isConst) {
-        info.stepVariable = builder->createTemporary(loc, stepValue.getType());
-        builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
+      mlir::Value lowerValue;
+      mlir::Value upperValue;
+      mlir::Value stepValue;
+
+      {
+        mlir::OpBuilder::InsertionGuard guard(*builder);
+
+        // Set the IP before the first loop in the nest so that all nest bounds
+        // and step values are created outside the nest.
+        if (incrementLoopNestInfo[0].doLoop)
+          builder->setInsertionPoint(incrementLoopNestInfo[0].doLoop);
----------------
vdonaldson wrote:

Thanks @harishch4 and @ergawy for working on this improvement.

As is, this implementation applies to all structured increment loops, not just do concurrent loops. That should be ok because non-do concurrent loops don't have multiple levels. Could you extend it to apply it to all increment loops, including unstructured loops, which are loops that contain branches? Test [loops.f90](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommunications.nvidia.com%2FPoliteMail%2Fdefault.aspx%3Fpage%3DfrU5QAjQZEm185y7awWsNQ%26ref_id%3D2MaLg7IuEU6FbskE4dYHAA&data=05%7C02%7Cvdonaldson%40nvidia.com%7Cfc686a4570e4468a49e208dcf76b4963%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638657287405659251%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=O07NgYMY4fwWdbf43%2FPhA36R1aKVKXNTs%2F5zCXUsKac%3D&reserved=0) has several example unstructured do concurrent loops. You would need to cache the analog of the `doLoop` op for unstructured loops.

https://github.com/llvm/llvm-project/pull/114020