[Mlir-commits] [mlir] [mlir][arith] Add more canonicalization and integration tests coverage (PR #92272)

Fri May 17 04:24:00 PDT 2024

banach-space wrote:

Thank you all for replying to my comments! 

### 1. Wider discussion on testing 
> In llvm it used to be much common to check in tests taken verbatim from bugzilla in individual regression test files.

I would prefer to avoid that. Instead, for every bug I'd rather identify the actual edge case that's:
* causing the issue, and
* is not covered by our testing.

In general, I don't quite follow the split into "regression" and regular tests (I also feel that folks interpret the split differently). From https://llvm.org/docs/TestingGuide.html#regression-tests

> The regression tests are small pieces of code that test a specific feature of LLVM or trigger a specific bug in LLVM.

To me that sounds like a test for a specific (edge) case that "could go wrong". For completness, that same document defines "unit test" as:

> Unit tests are written using [Google Test](https://github.com/google/googletest/blob/master/docs/primer.md) and [Google Mock](https://github.com/google/googletest/blob/master/docs/gmock_for_dummies.md) and are located in the llvm/unittests directory.

But let's not diverge too much - we all agree that we need more testing and patches like this one 😅 

Btw ...

> There's no in-tree frontend compiler (except for flang?) to truly exercise the conversions on real hardware

We do have e2e tests for Linalg (and Vector and SparseTensor) that target SVE and SME. These tests are run by LLVM buildbots (there's more tests for other targets too, but I don't track that as closely). Glad to hear that people find these valuable :)

### 2. This patch

#### Test files

Now, looking at the files added in this PR:
* mlir/test/Integration/Dialect/Arith/CPU/test-arithmetic.mlir
* mlir/test/Integration/Dialect/Arith/CPU/test-i1-operations.mlir
* mlir/test/Integration/Dialect/Arith/CPU/test-indices.mlir
* mlir/test/Integration/Dialect/Arith/CPU/test-int-trunc-ext.mlir
* mlir/test/Integration/Dialect/Arith/CPU/test-shifts.mlir

I see two conflicting splits:
1. one test file per operation (e.g. "test-shifts.mlir" and "test-int-trunc-ext.mlir"), vs
2. one test file per data type (e.g. "test-i1-operations.mlir" and "test-indices.mlir").

For consistency, could you select one and stick with that? My personal preference would be one file per operation (or group of similar operations) and then exercise "most interesting" data types in every file (`index`, `i1`, `i8` and `f32`)? 

Also, IMHO you can safely skip "test" in file names that are located in ... a "test" sub-directory (to reduce noise). That's a nit ;-)

#### Review process - multiple PRs

Given that you are testing a lot of edge cases, I'd like to go over every one of them and make sure they agree with my expectations. I am rising this especially in the context of the following PR in which we discovered that our folding logic for `arith::CeilDivSIOp` "overflows" in a case where the Op itself shouldn't:
* https://github.com/llvm/llvm-project/pull/90947

As in, I think it's good to review every case for such tests. I'm happy to do that, but I'd require this PR to be split into smaller PRs (one per Op?) - this one has grown quite quickly (btw, absolutely no harm in having multiple small PRs). I would definitely extract the canonicalization tests into a separate PR.

Importantly, if other reviewers are happy with the content then I'm not going to block it. You already have +1 from @kuhar who is an expert in this area.

#### Cases to test

I see three major cases being exercised in this PR (and in general in arith):
1. "sane" inputs (e.g. `10 + 10 = 20`)
2. "overflow" inputs (e.g. `-MIN_INT`)
3. "poison/non-poison" inputs
4. ... you could also add `nan`

Why not split them along these axis (i.e. check 1., 2. and 3.) and either:
* write a function for every op and pass different arguments depending on the case being checked (e.g. "sane" vs "overflow" vs "poison"), or
* write a separate function for 1., 2. and 3.

Btw, 

> The current tests don't cover all cases, although I think it wouldn't be realistic to do so either - that would be a combinatorial explosion and is perhaps better explored using property-based testing :) 

Agreed. I'm only suggesting to identify the key options/cases and to make sure every test makes it clear what case is being exercised. In my view the coverage proposed here is a good start. But we should prepare these tests for folks who'd like to add more tests at some later time (and to make it easy for them).

#### Running integration tests on different architectures

In your integration tests you don't specify the architecture on which to run the tests. In ideal world, these should work regardless of the CPU target. Sadly, this is not an ideal world :) A while back, @Lewuathe found a rather unpleasant corner cases highlighting a very fine difference in how `nan` values are treated:
* https://github.com/llvm/llvm-project/issues/58531#issuecomment-1990652429

I doubt that that ^^^ would impact the tests in this PR, but just to clarify, given that no triple is specified - these tests are intended to be target agnostic, right? That's worth clarifying somewhere.

### Final note

As I mentioned above, I won't block this PR. I wanted to share my ideas on the possible approach (I've approached this PR as an RFC). If you agree with my suggestions and decide to follow them, I'm happy to help with reviews. Most importantly, thank you for raising this very important topic, @pingshiyu !

Btw, I am OOO next week, so apologies if there's more delay from my side.

https://github.com/llvm/llvm-project/pull/92272