[PATCH] D98063: [flang][fir] Add the pre-code gen rewrite pass and codegen ops.

Wed Mar 17 13:39:44 PDT 2021

mehdi_amini added inline comments.

================
Comment at: flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp:234
+      runOn(global, global.getRegion());
+  }
+
----------------
schweitz wrote:
> mehdi_amini wrote:
> > mehdi_amini wrote:
> > > schweitz wrote:
> > > > mehdi_amini wrote:
> > > > > By making this a module pass, we're losing on parallelism.
> > > > > Can you make is an operation pass and filter here instead?
> > > > > 
> > > > > ```
> > > > > void runOnOperation()  {
> > > > >   Operation *op = getOperation();
> > > > >   if (auto func = op->dyn_cast<mlir::FuncOp>()) runOn(func, func.getBody());
> > > > >   if (auto global : op->dyn_cast<fir::GlobalOp>()) runOn(global, global.getRegion());
> > > > > }
> > > > > ```
> > > > It might be possible, but a quick experiment showed a bunch of tests regressing.
> > > What kind of regressions?
> > Actually in case I wasn't clear, the snippet I wrote above requires to schedule the pass twice in a pipeline, the test would look like this: `fir-opt --pass-pipeline="func(cg-rewrite),fir.global(cg-rewrite)"`
> > 
> > If you can provide an .mlir test where this regresses, I'd be happy to take a look.
> I spent some time working to make this pass multithreaded, including converting data structures to be thread local, etc. We have approximately 800 basic tests and many thousands of Fortran tests. The attempted changes caused many of these tests to regress and start failing. The existing pass accomplishes its task as is. There is no data-based argument that the pass is a bottleneck. The code is being upstreamed in a series of many patches, and there will be ample opportunity to write and debug a parallel algorithm in subsequent patches. Finally, anyone that wants to can contribute improvements on fir-dev.
> he code is being upstreamed in a series of many patches, and there will be ample opportunity to write and debug a parallel algorithm in subsequent patches

The problem is that I could your pass right after you land it, but that will create issues for you downstream, and right now it seems you can't show upstream why it is an issue with a test for this pass. That seems problematic to me for upstream development right now.

The "FuncOp Pass" vs  "Module Pass" is more than just "waiting for a bottleneck: it is about system design as whole. The fact that you bring thread_local here seems fishy and makes me worried about what kind of skeletons we'll find down the road.
It is also in our experience almost impossible to come back and fix this later after it creeps everywhere in the compiler. We have a good example of this: LLVM itself. I tried for a while various approaches to revisit some of the LLVM internals but this is engrained too far by now. MLIR was designed with more care to avoid any global state and produce "crash reproducers" that can be as hermetic as possible for example. It'll be quite sad to me to give on all this so early and without very strong justifications.

================
Comment at: flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp:248
+      op->erase();
+    opsToErase.clear();
+  }
----------------
schweitz wrote:
> mehdi_amini wrote:
> > schweitz wrote:
> > > mehdi_amini wrote:
> > > > I don't quite get the delayed erasing, this whole simplification could be done without keeping state in a vector:
> > > > 
> > > > region.walk([] (Operation *op) {
> > > >   if (auto embox = dyn_cast<EmboxOp>(op)) {
> > > >       if (embox.getShape()) {
> > > >         op->erase(); 
> > > >         return;
> > > >      }
> > > >   }
> > > > }
> > > > ```
> > > > 
> > > > (note that `op->erase()` should already assert for `op->use_empty()`, no need to duplicate here)
> > > > 
> > > > But also, could you do it more simply:
> > > > 
> > > > ```
> > > > region.walk([] (Operation *op) {
> > > >   if (isOpTriviallyDead(op)) op->erase();
> > > > }
> > > > ```
> > > > ?
> > > > 
> > > This should be a very general DCE like in mlir Transforms/.CSE.cpp. That seems to have been mangled but can be brought back.
> > Can you just call this instead? https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Transforms/RegionUtils.h#L57
> > 
> > In general generic algorithm in passes like this ends up being replicated everywhere when they can be exposed a general utilities, if the `simplifyRegions` does not fit the bill here can we introduce another utility there?
> Possibly. For the time being, we prefer to keep the simple dead-code cleanup.
I'm fine if you don't want to call `simplifyRegions`, but then can you refactor this as a pre-patch into a utility in a common place to avoid code duplication then?

I'm not enthusiastic to see each pass reimplementing generic utility / algorithm, and I don't see really a reason why we should at all.

This has community/project wise implications as well, the CIRCT folks were reporting performance issues with this kind of code path just a few days ago: https://llvm.discourse.group/t/speeding-up-canonicalize/3015 ; we were able to work together to improve the general tooling and infra around this.

================
Comment at: flang/tools/fir-opt/fir-opt.cpp:22
   fir::support::registerFIRPasses();
+  fir::registerOptPasses();
   DialectRegistry registry;
----------------
schweitz wrote:
> mehdi_amini wrote:
> > schweitz wrote:
> > > schweitz wrote:
> > > > mehdi_amini wrote:
> > > > > mehdi_amini wrote:
> > > > > > Why this change?
> > > > > What is the intended difference between `registerFIRPasses` and `registerOptPasses` ? 
> > > > Fixed.
> > > A synch problem with the source.
> > It is still unclear to me what is the intent here: `registerOptimizerPasses` is not documented, and `registerMLIRPassesForFortranTools` says "Register the standard passes we use" which seems like it should have all the required passes.
> > 
> > Why do you introduce `registerOptimizerPasses` at all?
> Upstreaming the code is being done in a series of many patches. I have chosen to group code in ways that make the upstreaming process more manageable. For example, the MLIR registration interfaces have been changing, and we want those calls isolated until they can be upstreamed. Furthermore, link times are already expensive, so we have made efforts for reduce that impact by selecting specific required libraries from MLIR. For these reasons there is no reason to change this now. There will be opportunities to regroup registration calls in subsequent patches
Can you clarify how this is impacting link time here?

Can you introduce the APIs when they make sense in-tree please! I have no way to figure if there is a good reason or not for what you send for review otherwise. I don't know in which state is your downstream project, but I'm concerned about the issue it is causing right now on the restriction it puts on the code structure and organization. For example the regressions you mentioned that you spot downstream but that can't be reproduced upstream with a lit tests.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98063/new/

https://reviews.llvm.org/D98063