[Mlir-commits] [mlir] [mlir] Retain original identifier names for debugging (PR #79704)

Sun Jan 28 03:05:08 PST 2024

Wheest wrote:

Thanks for the review!  I've tried to enumerate some requirements gathered from the [discourse discussion](https://discourse.llvm.org/t/retain-original-identifier-names-for-debugging/76417/16) and my own opinions, see below.

First though, regarding:

> The main purpose of the textual IR is to be faithful to the in-memory representation, and this is breaking this contract.

This is a valid concern.  I'll try and advocate for this being okay and/or not unprecedented:
- The in-memory representation also includes the location information of the operation, but the textual IR also does not print this by default
- Right now I am setting the names as "discardable attributes", which may be consumed by passes, so discarding the attributes as a "final pass" at print time is not entirely new behaviour.  
- Additionally, since the default anonymous names are generated at print time, using a different naming scheme doesn't misrepresent the IR 
- Although a given dialect or transformation could in principle do anything with the attribute dict, my intuition is that a well-formed dialect should either ignore or drop attributes it is not familiar with. Chris Lattner said [here](https://discourse.llvm.org/t/retain-original-identifier-names-for-debugging/76417/18?u=wheest) "While it may be imperfect MLIR officially sanctions installing “other dialect attributes” on ops."

I could add an additional flag such as `--no-attr-dict-cleanup`, so that users could optionally print the unmodified attr dicts (i.e., make my `op.removeDiscardableAttr("mlir.resultNames");` lines optional).  This could also be helpful for diagnosing unexpected namings generated by this pass.

This would generate IR which looks like:

<details><summary>IR with no attr dict cleanup</summary>
<p>

```mlir
module {
  func.func @add_one(%my_input: f64, %arg1: f64) -> f64 {
    %my_constant = arith.constant {mlir.resultNames = ["my_constant"]} 1.000000e+00 : f64
    %my_output = arith.addf %my_input, %my_constant {mlir.opArgNames = ["my_input", "my_constant"], mlir.resultNames = ["my_output"]} : f64
    return {mlir.opArgNames = ["my_output"]} %my_output : f64
  }
}
```

</p>
</details> 

Secondly, regarding:

> the interactions with the AsmOpInterface can be subtle

Yes, although the function that actually sets the names is a lambda which is passed by the caller to the operation (e.g., [the lambda in AsmPrinter we care about](https://github.com/llvm/llvm-project/blob/03cf0e9354e7e56ff794e9efb682ed2971bc91ec/mlir/lib/IR/AsmPrinter.cpp#L1539-L1547)).  This means that although we can't control what the operation does in its implementation, we can control what we do with the string it gives us.

Your comment makes me realise that I should add test cases where our chosen names might clash with the names chosen by a `AsmOpInterface`.  I'll add that in a future commit.

## Requirements

**Legend**
✅: I think we cover this
🟠: there could be an issue or improvement made.  Not a blocker to merging
⛔: not currently covered, blocker to merging

**Functionality**
1. 🟠 Identifier names: SSA names, basic blocks names, etc, should be retained when IR is printed, rather than giving anonymous names
2. ✅ Optional feature: this feature should not be default behaviour (it is enabled with a ParserConfig flag)
3. 🟠 Stable to errors: passes mean that operations and blocks can be mutated/added/deleted and the name list will drift. Misnaming may be acceptable, but crashes are not

**Performance**
1. ✅ No additional overhead introduced to critical path: since we use an attribute dict, no overhead introduced in core IR.  Textual parsing and printing has small cost but this is not performance critical (advocated [here](https://discourse.llvm.org/t/retain-original-identifier-names-for-debugging/76417/10))

**Maintainability**
1. ✅ Low-code footprint: this solution has a small code footprint, thus the logic is quite easy to read, update, or replace
2. 🟠 Tested: the solution should be incorporated into the main test harness (e.g., the FileCheck MLIR tests)

### 🟠: Issues to discuss or address

**Functionality**
- 1] Identifier names - we currently don't cover function arguments that are not used (since their names are stored in Operations which use them).  This is only marginally useful - it an arg isn't used, who cares?  Still, if I can support it I will
- 3] Stable to errors - I try to cover this with my compound condition for loops in `SSANameState::setRetainedIdentifierNames` (i.e., we take the minimum of the arguments and names we have).  However perhaps this could be better tested by applying some passes which change the number of arguments. I'm unsure of a good pass to apply.

**Maintainability**
- 2] Expand test coverage: Development of more test cases, ideally to test system behaviour under various transformation passes. Open to suggestions for specific scenarios that should be covered.

https://github.com/llvm/llvm-project/pull/79704