[llvm] Make a tablegen test match-table.td more robust. (PR #106508)

Mon Oct 7 07:08:41 PDT 2024

MaggieYingYi wrote:

Hi @Pierre-vh,

> 
> I understand if it's out of scope here. I guess we can land this and open an issue directly to come up with a more sustainable approach.
> 
> One easy approach would be to just read the file for the relevant function signatures, then read lines until we find the matching } and put all of that in a CHECK line. I don't think we'd need to base it off of an existing script like update_llc_test_checks
>

After studying/reading the existing python scripts for updating the tests, I agree that we could write a new python script to update all TableGen related tests as you suggested. 

We could not only check the output lines, but also make test more robust. Just like the changes made in this PR. 

Main changes in this PR for each target opcode using 
1. changed repeated `GIMT_Encode4(0)` to a simple `// CHECK-COUNT-40: GIMT_Encode4(0)`
For example, changed
```
// CHECK-NEXT:     /*TargetOpcode::G_AND*//*Label 1*/ GIMT_Encode4(506), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0), GIMT_Encode4(0),
// CHECK-NEXT:     /*TargetOpcode::G_STORE*/ ...
```
to:
```
// CHECK-NEXT:     /*TargetOpcode::G_AND*//*Label 1*/ GIMT_Encode4(506),
// CHECK-COUNT-34: GIMT_Encode4(0),    <--- changed here
// CHECK-NEXT:     /*TargetOpcode::G_STORE*/ ...
```

The number of "GIMT_Encode4(0)" is decided by the difference of instruction values between two intructions (e.g. `TargetOpcode::G_AND` and `TargetOpcode::G_STORE`). Instruction value refers to their enum value declared in TargetOpcodes.def, for exmple, enum value of `TargetOpcode::G_AND` is 62 and enum value of `TargetOpcode::G_STORE` is 97. The different value is 97-63=34, therefore, 34 GIMT_Encode4(0) are emitted between two instructions as shown above. To simple the test, we could use "CHECK-COUNT-34: GIMT_Encode4(0)".

Some organizations have added opcodes downstream, therefore the number of `GIMT_Encode4(0)` might be different from upsteam. Use "CHECK-COUNT-n" check will increase the test robust and reduce the impact of merge process.

2. Changed `/*Label n*/ GIMT_Encode4(value)` to `/*Label n*/ GIMT_Encode4([[Ln:[0-9]+]])`
   and changed `// Label n: @value` to `// Label n: @[[Ln]]`

In the generated "MatchTable0[]", Labels are emitted as a label commnet (e.g. /\*Label 1\*/) following by the current size (or position) of MatchTableRecord::NumElements as the table is built (e.g. `GIMT_Encode4(506)`). The label value (e.g. `506`) needs to match the label value mentioned in the later table content `// Label 1: @506`.

Again, since some organizations have added operators downstream in TargetOpcodes.def, the label value might be different from the upstream with off-by-n errors (with n being the number of added operators). The test is more robust by checking the labels match instead of checking the exact value of label.

The impact of merge process will be reduced based on the two aboved changes.

Could we firstly land this change and then improve the whole TableGen test by using test update python script?

Thanks,

https://github.com/llvm/llvm-project/pull/106508