[flang-commits] [flang] [flang][StackArrays] skip analysis of very large functions (PR #71047)

Thu Nov 2 04:00:39 PDT 2023

https://github.com/tblah created https://github.com/llvm/llvm-project/pull/71047

The stack arrays pass uses data flow analysis to determine whether heap allocations are freed on all paths out of the function.

`interp_domain_em_part2` in spec2017 wrf generates over 120k operations, including almost 5k fir.if operations and over 200 fir.do_loop operations, all in the same function. The MLIR data flow analysis framework cannot provide reasonable performance for such cases because there is a combinatorial explosion in the number of control flow paths through the function, all of which must be checked to determine if the heap allocations will be freed.

This patch skips the stack arrays pass for ridiculously large functions (defined as having more than 1000 fir.allocmem operations). This threshold is configurable at runtime with a command line argument.

With this patch, compiling this file is more than 80% faster.

>From cad53efdc49dcb753b7986809dc3bff044bc8437 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Wed, 1 Nov 2023 20:26:33 +0000
Subject: [PATCH] [flang][StackArrays] skip analysis of very large functions

The stack arrays pass uses data flow analysis to determine whether heap
allocations are freed on all paths out of the function.

interp_domain_em_part2 in spec2017 wrf generates over 120k operations,
including almost 5k fir.if operations and over 200 fir.do_loop
operations, all in the same function. The MLIR data flow analysis
framework cannot provide reasonable performance for such cases because
there is a combinatorial explosion in the number of control flow paths
through the function, all of which must be checked to determine if the
heap allocations will be freed.

This patch skips the stack arrays pass for ridiculously large functions
(defined as having more than 1000 fir.allocmem operations). This
threshold is configurable at runtime with a command line argument.

With this patch, compiling this file is more than 80% faster.
---
 flang/lib/Optimizer/Transforms/StackArrays.cpp | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/flang/lib/Optimizer/Transforms/StackArrays.cpp b/flang/lib/Optimizer/Transforms/StackArrays.cpp
index 9b90aed5a17ae73..7b066ec7a2bfda6 100644
--- a/flang/lib/Optimizer/Transforms/StackArrays.cpp
+++ b/flang/lib/Optimizer/Transforms/StackArrays.cpp
@@ -42,6 +42,12 @@ namespace fir {
 
 #define DEBUG_TYPE "stack-arrays"
 
+static llvm::cl::opt<std::size_t> maxAllocsPerFunc(
+    "stack-arrays-max-allocs",
+    llvm::cl::desc("The maximum number of heap allocations to consider in one "
+                   "function before skipping (to save compilation time)"),
+    llvm::cl::init(1000), llvm::cl::Hidden);
+
 namespace {
 
 /// The state of an SSA value at each program point
@@ -411,6 +417,17 @@ void AllocationAnalysis::processOperation(mlir::Operation *op) {
 mlir::LogicalResult
 StackArraysAnalysisWrapper::analyseFunction(mlir::Operation *func) {
   assert(mlir::isa<mlir::func::FuncOp>(func));
+  size_t nAllocs = 0;
+  func->walk([&nAllocs](fir::AllocMemOp) { nAllocs++; });
+  // don't bother with the analysis if there are no heap allocations
+  if (nAllocs == 0)
+    return mlir::success();
+  if ((maxAllocsPerFunc != 0) && (nAllocs > maxAllocsPerFunc)) {
+    LLVM_DEBUG(llvm::dbgs() << "Skipping stack arrays for function with "
+                            << nAllocs << " heap allocations");
+    return mlir::success();
+  }
+
   mlir::DataFlowSolver solver;
   // constant propagation is required for dead code analysis, dead code analysis
   // is required to mark blocks live (required for mlir dense dfa)