[flang-commits] [flang] [Flang] Add opt-in affine loop optimization pipeline (PR #191854)
via flang-commits
flang-commits at lists.llvm.org
Wed Apr 15 04:56:09 PDT 2026
================
@@ -0,0 +1,638 @@
+//===-- SimplifyDoLoop.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// General-purpose FIR loop canonicalization pass.
+//
+// Transforms fir.do_loop nests into a canonical form suitable for affine
+// promotion and loop optimizations (tiling, fusion, interchange, etc.).
+//
+// The canonical form has:
+// - No iter_args (shadow induction variable copies removed)
+// - No memory-based IV tracking inside the loop body
+// - Final IV values computed and stored after the outermost loop
+//
+// === Design Overview ===
+//
+// Analysis phase (per loop nest):
+// 1. Collect perfectly nested fir.do_loop chain.
+// 2. For each loop, verify iter_arg is a shadow of the induction variable:
+// - init = fir.convert(lower_bound)
+// - yield = arith.addi(iter_arg_or_load_of_iv, fir.convert(step))
+// 3. Verify safety conditions:
+// a. Only one store to IV alloca inside loop (the init store of iter_arg)
+// b. No function/subroutine calls in the nest
+// c. IV alloca does not escape (only load/store/declare users)
+// d. Loop results are only used for final IV stores
+//
+// Transformation phase:
+// 1. For each loop (innermost first):
+// a. Remove the initial store (fir.store %iter_arg to %iv_alloca)
+// b. Forward all loads of IV alloca inside loop body to fir.convert(IV)
+// todo: the forwarding of load of iv alloca can be done by some other pass like fir-memref-dataflow-opt pass (if it is available).
+// c. Strip iter_args and fir.result, rebuild as simple fir.do_loop
+// 2. After the outermost loop, compute and store final IV values
+// for all loops whose IV is live after the loop (outer to inner order).
+// Fortran final value: final_iv = lb + ((ub - lb + step) / step) * step
+// which equals the value of the iter_arg after the last increment.
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/FIRDialect.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/IR/Builders.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+
+namespace fir {
+#define GEN_PASS_DEF_SIMPLIFYDOLOOP
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "simplify-do-loop"
+
+using namespace fir;
+using namespace mlir;
+
+namespace {
+
+//===----------------------------------------------------------------------===//
+// Per-loop bookkeeping built during analysis
+//===----------------------------------------------------------------------===//
+
+struct LoopIVInfo {
+ fir::DoLoopOp loop;
+ Value ivAlloca; // fir.alloca for this loop's IV
+ SmallVector<Value, 2> ivAliases; // ivAlloca + any fir.declare alias
+ Value lowerBound; // index-typed lower bound
+ Value upperBound; // index-typed upper bound
+ Value step; // index-typed step
+ Type ivType; // Fortran IV type (e.g. i32)
+};
+
+//===----------------------------------------------------------------------===//
+// Helpers
+//===----------------------------------------------------------------------===//
+
+/// Collect the IV memory reference and all its aliases (the raw fir.alloca
+/// and any fir.declare results that alias it). `ivRef` may be either the
+/// alloca itself or a declare result — we normalise to the underlying alloca
+/// first, then collect all declare aliases from it.
+static SmallVector<Value, 2> collectAliases(Value ivRef) {
+ SmallVector<Value, 2> aliases;
+
+ // If ivRef is a declare result, trace back to the underlying alloca.
+ Value underlying = ivRef;
+ if (auto decl = ivRef.getDefiningOp<fir::DeclareOp>())
+ underlying = decl.getMemref();
+
+ aliases.push_back(underlying);
+ for (auto *user : underlying.getUsers())
+ if (auto decl = dyn_cast<fir::DeclareOp>(user))
+ aliases.push_back(decl.getResult());
+
+ return aliases;
+}
+
+/// Collect a perfectly nested chain of fir.do_loop ops starting from `outer`.
+/// A loop is considered perfectly nested if between each nesting level only
+/// IV-related operations (stores, converts) and the inner loop exist.
+static SmallVector<fir::DoLoopOp> collectNest(fir::DoLoopOp outer) {
+ SmallVector<fir::DoLoopOp> nest;
+ fir::DoLoopOp cur = outer;
+ while (cur) {
+ nest.push_back(cur);
+ fir::DoLoopOp inner;
+ unsigned loopCount = 0;
+ for (auto &op : cur.getBody()->getOperations())
+ if (auto nested = dyn_cast<fir::DoLoopOp>(op)) {
+ inner = nested;
+ ++loopCount;
+ }
+ if (loopCount != 1)
+ break;
+ cur = inner;
+ }
+ return nest;
+}
+
+/// Strip fir.convert chains to find the root SSA value.
+static Value stripConverts(Value val) {
+ while (auto conv = val.getDefiningOp<fir::ConvertOp>())
+ val = conv.getValue();
+ return val;
+}
+
+/// Check whether `val` originates from `target` (possibly through fir.convert).
+static bool originatesFrom(Value val, Value target) {
+ return stripConverts(val) == target;
+}
+
+/// Find IV alloca: the first fir.store in the loop body whose value
+/// originates from the iter_arg or the induction variable (possibly through
+/// fir.convert chains).
+// ***** We scan the entire top-level body rather than
+/// stopping at an inner fir.do_loop so that the pass remains robust if
+/// upstream passes reorder operations.
+static Value findIVAlloca(fir::DoLoopOp loop) {
+ if (!loop.hasIterOperands() || loop.getNumIterOperands() < 1)
+ return {};
+ auto iterArg = loop.getRegionIterArgs()[0];
+ auto iv = loop.getInductionVar();
+ for (auto &op : loop.getBody()->getOperations()) {
+ if (auto store = dyn_cast<fir::StoreOp>(op)) {
+ Value stored = store.getValue();
+ if (originatesFrom(stored, iterArg) || originatesFrom(stored, iv))
+ return store.getMemref();
+ }
+ }
+ return {};
+}
+
+//===----------------------------------------------------------------------===//
+// ANALYSIS PHASE
+//===----------------------------------------------------------------------===//
+
+// ---- Analysis 1: Confirm iter_arg is a shadow of the induction variable ----
+//
+// The iter_arg must mirror the index-typed induction variable:
+// init = fir.convert(lower_bound) : (index) -> i32
+// yield = arith.addi(iter_arg_or_load_of_iv, fir.convert(step))
+
+static bool isShadowIV(fir::DoLoopOp loop, Value ivAlloca) {
+ auto iterOperands = loop.getIterOperands();
+ auto iterArg = loop.getRegionIterArgs()[0];
+
+ auto initConvert = iterOperands[0].getDefiningOp<fir::ConvertOp>();
+ if (!initConvert || initConvert.getValue() != loop.getLowerBound()) {
+ LLVM_DEBUG(llvm::dbgs() << " [shadow] init is not fir.convert(lb)\n");
+ return false;
+ }
+
+ auto resultOp = cast<fir::ResultOp>(loop.getBody()->getTerminator());
+ auto addOp = resultOp.getOperand(0).getDefiningOp<arith::AddIOp>();
+ if (!addOp) {
+ LLVM_DEBUG(llvm::dbgs() << " [shadow] yield is not arith.addi\n");
+ return false;
+ }
+
+ auto isIVValue = [&](Value v) -> bool {
+ if (v == iterArg)
+ return true;
+ if (auto load = v.getDefiningOp<fir::LoadOp>()) {
+ if (load.getMemref() == ivAlloca)
+ return true;
+ if (auto decl = load.getMemref().getDefiningOp<fir::DeclareOp>())
+ if (decl.getMemref() == ivAlloca)
+ return true;
+ }
+ return false;
+ };
+
+ Value stepSide;
+ if (isIVValue(addOp.getLhs()))
+ stepSide = addOp.getRhs();
+ else if (isIVValue(addOp.getRhs()))
+ stepSide = addOp.getLhs();
+ else {
+ LLVM_DEBUG(llvm::dbgs() << " [shadow] addi doesn't use iter_arg/IV\n");
+ return false;
+ }
+
+ auto stepConvert = stepSide.getDefiningOp<fir::ConvertOp>();
+ if (!stepConvert || stepConvert.getValue() != loop.getStep()) {
+ LLVM_DEBUG(llvm::dbgs() << " [shadow] step operand mismatch\n");
+ return false;
+ }
+ return true;
+}
+
+// ---- Analysis 2: Only one store to IV alloca inside loop (the init store) --
+
+static bool singleStoreToIVAlloca(fir::DoLoopOp loop,
+ ArrayRef<Value> ivAliases) {
+ auto iterArg = loop.getRegionIterArgs()[0];
+ auto iv = loop.getInductionVar();
+ bool foundInit = false;
+ bool ok = true;
+
+ loop.walk([&](fir::StoreOp store) {
+ if (!llvm::is_contained(ivAliases, store.getMemref()))
+ return;
+ if (!foundInit && (originatesFrom(store.getValue(), iterArg) ||
+ originatesFrom(store.getValue(), iv))) {
+ foundInit = true;
+ return;
+ }
+ LLVM_DEBUG(llvm::dbgs()
+ << " [store] extra store to IV: " << store << "\n");
+ ok = false;
+ });
+ return ok;
+}
+
+// ---- Analysis 3: No function/subroutine calls in the nest -----------------
+
+static bool noCallsInNest(fir::DoLoopOp outermost) {
+ bool ok = true;
+ outermost.walk([&](Operation *op) {
+ if (isa<fir::CallOp>(op) || isa<func::CallOp>(op) ||
+ isa<fir::DispatchOp>(op)) {
+ LLVM_DEBUG(llvm::dbgs() << " [call] found: " << *op << "\n");
+ ok = false;
+ }
+ });
+ return ok;
+}
+
+// ---- Analysis 4: IV alloca must not escape --------------------------------
+
+static bool ivDoesNotEscape(ArrayRef<Value> ivAliases) {
+ for (auto alias : ivAliases)
+ for (auto *user : alias.getUsers())
+ if (!isa<fir::StoreOp, fir::LoadOp, fir::DeclareOp>(user)) {
+ LLVM_DEBUG(llvm::dbgs() << " [escape] IV escapes: " << *user << "\n");
+ return false;
+ }
+ return true;
+}
+
+// ---- Full nest analysis ---------------------------------------------------
+
+static bool analyzeNest(SmallVector<LoopIVInfo> &infos) {
+ // --- Per-loop: shadow-IV check, IV alloca discovery, single-store check ---
+ for (auto &info : infos) {
+ auto loop = info.loop;
+ if (!loop.hasIterOperands() || loop.getNumIterOperands() != 1) {
+ LLVM_DEBUG(llvm::dbgs() << " skip: loop has != 1 iter_args at "
+ << loop.getLoc() << "\n");
+ return false;
+ }
+
+ info.ivAlloca = findIVAlloca(loop);
+ if (!info.ivAlloca) {
+ LLVM_DEBUG(llvm::dbgs()
+ << " cannot find IV alloca at " << loop.getLoc() << "\n");
+ return false;
+ }
+
+ info.ivAliases = collectAliases(info.ivAlloca);
+
+ if (!isShadowIV(loop, info.ivAlloca)) {
+ LLVM_DEBUG(llvm::dbgs()
+ << " not shadow IV at " << loop.getLoc() << "\n");
+ return false;
+ }
+
+ if (!singleStoreToIVAlloca(loop, info.ivAliases)) {
+ LLVM_DEBUG(llvm::dbgs()
+ << " multiple stores at " << loop.getLoc() << "\n");
+ return false;
+ }
+
+ // Record loop bounds and IV type from the iter_arg init value.
+ info.lowerBound = loop.getLowerBound();
+ info.upperBound = loop.getUpperBound();
+ info.step = loop.getStep();
+ info.ivType = loop.getIterOperands()[0].getType();
+ }
+
+ // --- No function calls in the nest ---
+ if (!noCallsInNest(infos.front().loop))
+ return false;
+
+ // --- IV alloca must not escape ---
+ for (auto &info : infos) {
+ if (!ivDoesNotEscape(info.ivAliases))
+ return false;
+ }
+
+ // --- Loop results must only be used for final IV stores ---
+ for (auto &info : infos) {
+ for (auto result : info.loop.getResults()) {
+ for (auto *user : result.getUsers()) {
+ auto store = dyn_cast<fir::StoreOp>(user);
+ if (!store || !llvm::is_contained(info.ivAliases, store.getMemref())) {
+ LLVM_DEBUG(llvm::dbgs()
+ << " [result] loop result used outside IV store at "
+ << info.loop.getLoc() << ": " << *user << "\n");
+ return false;
+ }
+ }
+ }
+ }
+
+ return true;
+}
+
+//===----------------------------------------------------------------------===//
+// TRANSFORMATION PHASE
+//===----------------------------------------------------------------------===//
+
+/// Ensure a value is available (dominates) at the current insertion point.
+/// If the value is already defined outside `outermost`, return it directly.
+/// Otherwise, rematerialize the computation by cloning through simple ops
+/// (fir.convert, fir.load, arith constants).
+///
+/// `ivFinalMap` maps loop induction variables (block arguments) to their
+/// already-computed final index values. This allows inner loop bounds that
+/// depend on outer IVs (e.g. triangular loops) to be correctly resolved.
+static Value rematerializeOutside(Value val, fir::DoLoopOp outermost,
+ OpBuilder &builder, Location loc,
+ const DenseMap<Value, Value> &ivFinalMap) {
+ // Already defined outside the outermost loop — use directly.
+ if (auto blockArg = dyn_cast<BlockArgument>(val)) {
+ if (!outermost->isAncestor(blockArg.getOwner()->getParentOp()))
+ return val;
+ auto it = ivFinalMap.find(val);
+ if (it != ivFinalMap.end())
+ return it->second;
+ return val;
+ }
+ if (auto *defOp = val.getDefiningOp()) {
+ if (!outermost->isAncestor(defOp))
+ return val;
+ }
+
+ auto *defOp = val.getDefiningOp();
+ if (!defOp)
+ return val;
+
+ // fir.convert: rematerialize the input, then re-emit the convert.
+ if (auto conv = dyn_cast<fir::ConvertOp>(*defOp)) {
+ auto newInput = rematerializeOutside(conv.getValue(), outermost, builder,
+ loc, ivFinalMap);
+ return fir::ConvertOp::create(builder, loc, conv.getType(), newInput);
+ }
+
+ // fir.load: the address must already be outside (alloca/declare/etc).
+ if (auto load = dyn_cast<fir::LoadOp>(*defOp)) {
----------------
shuyadav-dev wrote:
Good catch, thank you. After further analysis, I found this can create issue — if there is a fir.load inside the loop whose address cannot be lifted out, the underlying memory may have been modified by the loop body, so rematerializing the load after the loop could produce a wrong or invalid result. I verified this with a test case and confirmed the issue.
I've addressed this by adding a `canSafelyRematerialize` analysis function that runs during the analysis phase (before any transformation). It recursively checks whether each loop bound can be safely duplicated after the outermost loop:
* Safe: values defined outside the loop, loop IVs (block args of fir.do_loop), fir.convert, arith.constant, and pure arithmetic ops over safe operands.
* Unsafe: `fir.load` of a non-IV address inside the loop — the memory may have been modified between the original load and the post-loop insertion point.
* IV loads (e.g., triangular bounds `do j = 1, i`) are explicitly recognized as safe because `transformOneLoop` forwards them to `fir.convert(IV)` before rematerializeOutside runs.
If any bound fails the check, the entire nest is rejected by `analyzeNest`. This eliminates the need for the `fir.load` handler in `rematerializeOutside` entirely — it has been removed.
I have updated the code in the recent commit and added proper explanation of it also, could you please check if this addresses your concern?
https://github.com/llvm/llvm-project/pull/191854
More information about the flang-commits
mailing list