[PATCH] D148216: Add support for annotations in UpdateTestChecks (NFC)

Thu May 25 12:59:39 PDT 2023

nikic added inline comments.

================
Comment at: clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected:12
+// CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[X]], align 4
+// CHECK-NEXT:    ret i32 [[TMP1]]
+//
----------------
hnrklssn wrote:
> nikic wrote:
> > hnrklssn wrote:
> > > nikic wrote:
> > > > nikic wrote:
> > > > > hnrklssn wrote:
> > > > > > hnrklssn wrote:
> > > > > > > nikic wrote:
> > > > > > > > hnrklssn wrote:
> > > > > > > > > delcypher wrote:
> > > > > > > > > > @hnrklssn I just noticed we don't have a `CHECK` for what `META2` actually refers to. Should we?
> > > > > > > > > > 
> > > > > > > > > > Not something that has to be fixed in this patch, more just an observation.
> > > > > > > > > Indeed this is true for metadata in general, presumably because the RHS often refer to things like other metadata identifiers. In the case of annotations they seem to always refer to simple strings however, so it would be feasible to do a straight match without having to do recursive matching or complex regexes to determine which part of the metadata to match against.
> > > > > > > > > 
> > > > > > > > > In many cases with metadata attached to IR nodes, multiple nodes refer to the same metadata node, so at least you verify that they still are consistent. But I agree that verifying the content would be a great future addition.
> > > > > > > > You need to pass `--check-globals` to check the actual metadata.
> > > > > > > When I add that to this test case it adds
> > > > > > > 
> > > > > > > ```
> > > > > > > //.
> > > > > > > // CHECK: attributes #0 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
> > > > > > > //.
> > > > > > > // CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
> > > > > > > // CHECK: !1 = !{!"clang version 17.0.0 (git at github.com:llvm/llvm-project.git 684914f47cf59e9ab6d8b0f73c58ca6272ea28d4)"}
> > > > > > > // CHECK: !2 = !{!"auto-init"}
> > > > > > > //.
> > > > > > > ```
> > > > > > > 
> > > > > > > So it seems to just be doing a simple literal matching on all metadata, regardless of whether we captured that metadata in any filecheck variable. And it still doesn't use the META2 variable to match the definition. Am I missing something? If we use the literal metadata names instead of variable matching for the definitions, there isn't much point in doing variable matching for the metadata uses either, since the test still very much relies on the metadata numbering being stable.
> > > > > > @nikic Do you have more information to add about how metadata definition matchers can be generated without hardcoding everything (which is kind of the opposite of what this patch is trying to do), or in general if you're happy with the state of the PR?
> > > > > This works fine with update_test_checks, so it must be some bug in update_cc_test_checks in particular. From a quick look, I suspect it's because https://github.com/llvm/llvm-project/blob/3d05ab6d3e24e76ff53b8d7d623c436b4be5b809/llvm/utils/update_cc_test_checks.py#L447 hardcodes a True value, while update_test_checks makes this dependent on `--preserve-names`, which is disabled by default there.
> > > > Or maybe just test this with update_test_checks? This change is valuable for that script as well, and it doesn't have this extra issue.
> > > I couldn't find any uses of 
> > > `--preserve-names` in the entire repo (not even in tests for update_test_checks), so I went ahead and changed update_cc_test_checks to pass True instead.
> > > 
> > > While at it I added 2 new options to match some, but not necessarily all, globals. I set the least verbose one (other than "none") as the default, emitting checks for the definitions of all globals that we emit matches for in the IR. Do you agree that this is reasonable?
> > That sounds like a nice approach. My main question would be whether `--check-globals seen` is the right default: My gut feeling here is that this will introduce additional checks for attributes/metadata in tests that don't actually care about it, but I'm not particularly sure about that.
> > 
> > You might want to do some mass test regeneration and take a look over the diff to see how useful the changes would be.
> > 
> > If we do change the default, the default change likely need to be part of `--version 3` to avoid substantial test churn.
> I've regenerated checks before and after my patch and compared them. Some findings:
>  - there's a surprising number of custom metadata types that didn't get variable checkers but instead checked hardcoded metadata IDs (!sloc, !pcsections, !alias.scope, !noalias, !nonnull, !noundef, !dereferenceable, !nontemporal, !fpmath etc.). The tests often have checkers, but were removed when regenerating them before my patch, so something seems to have broken.
>  - I'm split on whether global variables should match the exact name or get a matcher. Right now the patch handles them like any other global, but I'm assuming that the name actually carries significance for linking?
>  - matching attributes seems useful in some cases, not matter too much in others, but often there aren't that many attributes so it does not pollute much. Tests where attribute matchers pollute should probably use --scrub-attributes in update_check_tests (does not exist in update_cc_check_tests) anyways
>  - tests that include debug info generally do so deliberately, so matching against that seems reasonable
>  - TBAA metadata doesn't give too much insight unless transitive/all. That said not all TBAA tests check metadata at all today (at least not llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll)
>  - Similar goes for !llvm.loop, the useful info is pointed to indirectly
>  - Profile info doesn't seem to be included very often. It's probably reasonable to match against it when it's there. Found one test that was specifically created to check profile info metadata but lost the check during conversion to update_check_tests (llvm/test/Transforms/ArgumentPromotion/profile.ll)
> 
> Since the `transitive` setting provides substantially more insight than the `seen` setting for loop and tbaa, and only really explodes for debug info (which is rarely included, and deliberately so when done) I'm open to making it the default instead. For `opt` based tests I kind of think that `all` should be the default since any metadata in the output was either created by the tested passes, or explicitly included in the input.
> I've regenerated checks before and after my patch and compared them. Some findings:
>  - there's a surprising number of custom metadata types that didn't get variable checkers but instead checked hardcoded metadata IDs (!sloc, !pcsections, !alias.scope, !noalias, !nonnull, !noundef, !dereferenceable, !nontemporal, !fpmath etc.). The tests often have checkers, but were removed when regenerating them before my patch, so something seems to have broken.

Yes, which is why I'm happy to have the generic implementation here -- many of these were handled by manually adjusting the update_test_checks output...

>  - I'm split on whether global variables should match the exact name or get a matcher. Right now the patch handles them like any other global, but I'm assuming that the name actually carries significance for linking?

I'm thinking we should probably retain the name of globals, the same way we do for functions. Unlike things like local variable names, and attribute/metadata IDs, these tend to be stable and semantically meaningful. (Also the wildcard generated for them breaks syntax highlighting in vim. Grumble, grumble.)

>  - matching attributes seems useful in some cases, not matter too much in others, but often there aren't that many attributes so it does not pollute much. Tests where attribute matchers pollute should probably use --scrub-attributes in update_check_tests (does not exist in update_cc_check_tests) anyways
>  - tests that include debug info generally do so deliberately, so matching against that seems reasonable
>  - TBAA metadata doesn't give too much insight unless transitive/all. That said not all TBAA tests check metadata at all today (at least not llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll)
>  - Similar goes for !llvm.loop, the useful info is pointed to indirectly
>  - Profile info doesn't seem to be included very often. It's probably reasonable to match against it when it's there. Found one test that was specifically created to check profile info metadata but lost the check during conversion to update_check_tests (llvm/test/Transforms/ArgumentPromotion/profile.ll)
> 
> Since the `transitive` setting provides substantially more insight than the `seen` setting for loop and tbaa, and only really explodes for debug info (which is rarely included, and deliberately so when done) I'm open to making it the default instead. For `opt` based tests I kind of think that `all` should be the default since any metadata in the output was either created by the tested passes, or explicitly included in the input.

I think there's broadly two cases:

1. There are globals/attributes/metadata in the input, which are needed to make the test work, but not important for the test output, because they will just be preserved. E.g. an InstSimplify test might use globals/attributes/metadata, but won't ever change them, and checking that fact is not particularly useful.

2. Tests for passes that do modify/add attributes/metadata, such as IPO inference passes. It's worth noting that there is a separate `--check-attributes` flag that covers some of these cases, which will match the attributes printed before the function definition, which is generally more convenient than matching the attributes at the end of the module.

The first the use case we generally do not want to match any globals, and for the latter we probably want either `--check-attributes` (if testing function attributes in particular) or `--check-globals transitive` (if testing something metadata or call-site attributes).

I'm still not really sure what the right default would be. I guess one could argue that for the former case, having a few extra attributes at the test isn't a big deal -- one can just ignore them. But that does also mean that whenever we for example change attributes on an intrinsic, this will now suddenly affect hundreds of tests that encode those attributes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148216/new/

https://reviews.llvm.org/D148216