[PATCH] D83924: [mlir] Add update_mlir_test_checks.py

Mon Jul 20 21:16:26 PDT 2020

silvas added inline comments.

================
Comment at: llvm/utils/update_mlir_test_checks.py:29
+The script is designed to make adding checks to a test case fast, it is *not*
+designed to be authoratitive about what constitutes a good test!
+"""
----------------
mehdi_amini wrote:
> stephenneuendorffer wrote:
> > silvas wrote:
> > > rriddle wrote:
> > > > stephenneuendorffer wrote:
> > > > > mehdi_amini wrote:
> > > > > > stephenneuendorffer wrote:
> > > > > > > mehdi_amini wrote:
> > > > > > > > Something that I believe is important is to not just have blindly generated CHECKs but have a human editing the tests (documenting, minimizing the number of checks, etc.). 
> > > > > > > > 
> > > > > > > > Can we have the script adding a paragraph mentioning in the actual generated test? (with a link to the testing guide, etc.)
> > > > > > > Conceivably...but I'm guessing that we'd end up with the same paragraph in lots of files.  I tend to use this to auto-update tests in a mechanical way which is easy to verify, but hard to do by hand.  I'm specifically trying to avoid opening every file when updating lots of tests, so I think adding a bunch of text in the file that is expected to be removed would be a bad idea.  What if we just added a boilerplate message about usage and norms that was generated when the script is run?
> > > > > > > Conceivably...but I'm guessing that we'd end up with the same paragraph in lots of files.
> > > > > > 
> > > > > > I mean: since the user is supposed to manually edit significantly the output of the tool, removing the paragraph doesn't seem much.
> > > > > > 
> > > > > > The challenge is not get the output of the script just good enough to be checked in as-is but leading to actual bad tests
> > > > > Personally, I think we'll get better tests if we get people to focus on making good tests, rather than tedious manual editing.  I think that this goes more into the input of tests, than the format of the check lines.  I think forcing people to 'manually edit significantly' should definitely not be the goal.
> > > > Forcing lots of manual edits is not the goal, but a more important goal is not to have "diff" tests. IMO having a large amount of users manually edit is much better than having one person have to deal with the fallout of having to update hundreds of tests that broke because they were testing something unnecessary. It spreads out the maintenance efforts and makes evolution easier. (Speaking from having had to deal with this a lot in practice.)
> > > +1 to what River says. Encouraging some testing thoughtfulness would be really great.
> > > 
> > > If we really want to be evil we can insert some random garbage characters in the generated test checks 😈 so that people will have to manually run the test / look through the checks / test failures to remove them. And of course, the easiest way to fix such deliberately mangled tests is to delete whole CHECK lines, types, and other stuff, which is exactly what we want them to be doing; "the more I delete, the less I have to audit for random garbage inserted by the update_mlir_test_checks script".
> > > 
> > > Less evilly, I think that Mehdi's idea of generating a "test best practices summary + link to complete best practices doc" sounds good. It could look like:
> > > 
> > > ```
> > > PLEASE make sure that you trim down what is being checked by this test to the minimum that gives decent coverage. That will make life easier for people maintaining the test in the future.
> > > - Does your test need to check all operations? If not, then remove the unnecessary CHECK lines! 
> > > - Does your test need to check the types? If not, then remove the types from the CHECK lines!
> > > - Does your test need to check all attributes of the ops? If not, then use {{.*}} to skip them!
> > > - Does your test really need to check that SSA values are passed correctly? (rather than the mere presence of an op). If not, then remove the %[[FOO:.*]] from CHECK lines!
> > > 
> > > For more info on testing best practices see <link to more thorough docs> 
> > > ```
> > > 
> > > Btw, does this script throw away any user edits? (haven't looked at the code very much). If it does, then users will probably get bored of re-trimming-down the autogenerated stuff no matter how much encouragement we give them.
> > > 
> > > 
> > Can you help me out, because I don't really buy the "making it hard" part.  If people aren't looking at their tests, and reviewers are looking at the tests either, then I don't think we can help them.  
> > 
> > It also seems to me that minimal CHECK lines come from minimal input.  I'd rather have consistent tests (i.e. automated with good formatting) that are easy to read and update, but perhaps more verbose, than CHECK lines which are absolutely minimal, but are poorly documented about what they are testing, or missing so much context that they are impossible to debug when they fail.  Can you suggest some good 'heavily minimized' tests, so we can see how much difference there is between what is automatically generated and what is human-minimal?  I'd like to encourage thought be put into tests, but temper it with some practicality.
> > 
> > Yes, the script throws away user edits.  I think to not do that we need something significantly smarter than a bunch of REGEXes (along the lines of an MLIR dialect for CHECK lines.)
> > It also seems to me that minimal CHECK lines come from minimal input. 
> 
> I think that is only half of it. Minimizing the input isn't enough to minimize the CHECK.
> What would the script generate here for instance: https://github.com/llvm/llvm-project/blob/master/mlir/test/Transforms/sccp.mlir ?
There's a couple reasons why we might want to build in a bit of hand-holding into the script:

1. Not all MLIR development happens in MLIR core upstream and is subjected to the same level of review
2. Not all MLIR development is being done by people with extensive experience with FileCheck tests/best practices

I definitely sympathize with the idea that minimized check lines and result in difficult to update/understand tests (and tests that require a lot more staring-at to absorb their documentation value). While it is easier for test maintainers to update tests if they just need to run the script, it also loses a lot of value of the test (what if the test is now actually "failing"?). Except for crashes, simply updating the checks blindly with a script will let bugs slip through.

I think the idea is that a properly minimized test (both the IR and the set of things it is checking) won't be broken by practically any changes other than ones that truly break it (or are merely cosmetic, like op syntax changes) and need manual attention anyway.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83924/new/

https://reviews.llvm.org/D83924