[Mlir-commits] [mlir] [mlir] Retain original identifier names for debugging (PR #79704)

Tue Jan 30 03:42:58 PST 2024

Wheest wrote:

> I would want to see at least a design for another more principled approach before possibly considering this one.
> 
> At the moment I'm more on the side of "this feature isn't worth the drawback of polluting the IR".

Excluding adding a new member variable to the Value class (which adds an unacceptable constant overhead), I think there are two approaches that are worth considering.

### 1. Location based approach

Would not require changes to the IR, however it would require re-parsing, which could add a lot more complexity (e.g., how do we reassociate a newly parsed Value and its name with the original Value?)

There are also clients of location info, for whom this feature [could cause problems]( https://discourse.llvm.org/t/retain-original-identifier-names-for-debugging/76417/18).

### 2. Use of external data structure

We could create an external data structure, e.g., a `DenseMap<Value, StringRef>`.  I did this in [an initial implementation](https://discourse.llvm.org/t/retain-original-identifier-names-for-debugging/76417/4?u=wheest), but made the owner MLIRContext, which is not desirable since that's not really what MLIRContext is for.

However, typically (e.g., in `mlir-opt`) the parser and printer are in the same scope, so we could make the map user-controlled, and (optionally) pass it to the Parser (which populates the map), and then to the Printer (which reads the map and updates the names accordingly).

We have something similar to this already, the `FallbackAsmResourceMap`, which is passed to both the parser and printer.  I'm not sure if this would be the best place to put this map, or just to handle it separately.  It seems `FallbackAsmResourceMap` is used in `BytecodeWriter` (to leverage its printers).  Even adding an empty map to the Bytecode system is not attractive to me since I believe it is more performance critical.

A direct Value mapping approach also has the advantage compared to the attribute dict approach of reducing ambiguity in the case of IR transformation.  If we introduce new arguments or operations, unless the Value is the same, we don't touch the name.

____

Personally, I think option 2. is a more straightforward and less intrusive solution.  If we made the map a user-managed object (e.g., by `mlir-opt`'s `performActions` function, then it keeps our requirement of only adding overhead to the AsmParser and AsmPrinter.

Regarding the attribute dictionary approach, the only situation when the IR would be polluted is when this flag is enabled, which limits its impact.

https://github.com/llvm/llvm-project/pull/79704