[mlir] [llvm] [CI] Add check-mlir-python to MLIR pre-merge checks (PR #72847)

Thu Nov 23 17:36:35 PST 2023

joker-eph wrote:

> The problem is that we don't run "the necessary tests". We run check-<whatever> and that's it

This is at least how llvm, clang, flang, bolt, lldb, lld, and mlir are tested. That's not just isolated cases...

> for some projects that may be insufficient. 

I don't disagree, but you seem to come from the "exception" and proposing to generalize it. I don't quite understand why this is better...

> For example this patch mentions that we need to also check check-mlir-python,

This was debunked as wrong during the review and removed from the diff already. 
But even if it had been the case:
1. I would have rather added it to the cmake dependency 
2. So what if we need to call `check-mlir` **and** `check-mlir-python`? Would you create one bot that builds MLIR+LLVM and runs `check-mlir` and another bot that builds MLIR+LLVM and runs `check-mlir-python`? Why?

>  the runtimes we have much larger needs like checking different configurations (exceptions enabled or not, etc). 

Sure, the runtimes are "different" from the monolithic build, but I don't see how that applies to the testing of changes to "llvm, clang, flang, bolt, lldb, lld, and mlir"?

> like checking different configurations (exceptions enabled or not, etc).

For the other projects this is more a questions of whether there are "project specific configs": for example I'm sure there are many configs for LLVM (let's with/without MLGO configured for example, or with/without Z3 solver) that won't affect all the other project. And enabling such flag deserve a dedicated testing (whether it deserves to run this in premerge is another story).

But if we look at the testing matrix of the "standard" config of the mono-repo to run all the unit-tests (excluding the runtime), you have:
- with/without sanitizers
- various host compilers (gcc-7.5 -> latest, clang, MSVC)
- various host environments (multiple linux distribution, Windows, MacOS, *BSD)
- various HW (X86, Arm64, PowerPC, ...)

So without changing any config to the project settings, we already have a matrix to sweep over >100 configurations here.

Having every single project setting up the full matrix does not seem like a good  path forward to me: this is highly redundant and there is no "project-specific value add" in any of these.
Seems much more productive for us to:
- Have a strong monolithic setup (for all but runtimes)
- Have this monolithic setup being able to sweep as much of the >100 config as possible (likely can't be exhaustive)
- Have clang-toolchain specific setup (3-stages bootstrap, LTO / ThinLTO, etc.)
- Have runtime specific pipelines.

If I'm working on MLIR, lld or Bolt, it's much better for me to benefit from the monolithic setup taking care of the two steps above instead of every sub-project having to redo everything. (On top of being more human-resource efficient, it's also much much more efficient in terms of could resources). 
I understand you live in an island with libc++, but this just does not apply to the other project which are much more similar to each other (there is a reason why we have a mono repo, from all account and seeing how you operate, libc++ may just not belong here).

https://github.com/llvm/llvm-project/pull/72847