[llvm-dev] [RFC] Compiled regression tests.

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 1 07:33:21 PDT 2020


On 7/1/20 12:40 AM, Michael Kruse via llvm-dev wrote:
> To illustrate some difficulties with FileCheck, lets make a 
> non-semantic change in LLVM:
>
>     --- a/llvm/lib/Analysis/VectorUtils.cpp
>     +++ b/llvm/lib/Analysis/VectorUtils.cpp
>     @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode 
> *AccGroups1, MDNode *AccGroups2) {
>          return AccGroups1;
>
>        SmallSetVector<Metadata *, 4> Union;
>     -  addToAccessGroupList(Union, AccGroups1);
>        addToAccessGroupList(Union, AccGroups2);
>     +  addToAccessGroupList(Union, AccGroups1);
>
>        if (Union.size() == 0)
>          return nullptr;
>
> This changes the order of access groups when merging them.
> Same background: Access groups are used to mark that a side-effect 
> (e.g. memory access) does not limit the parallelization of a loop, 
> added by e.g. #pragma clang loop vectorize(assume_safety). Therefore a 
> loop can be assumed to be parallelizable without further dependency 
> analysis if all its side-effect instructions are marked this way (and 
> has no loop-carried scalar dependencies).
> An access group represents the instructions marked for a specific 
> loop. A single instruction can be parallel with regrads to multiple 
> loop, i.e. belong to multiple access groups. The order of the access 
> groups is insignificant, i.e. whether either `AccGroups1` or 
> `AccGroups2` comes first, the instruction is marked parallel for both 
> loops.
>
> Even the change should have no effect on correctness, `ninja 
> check-llvm` fails:
>
>     test/Transforms/Inline/parallel-loop-md-merge.ll
>
>      string not found in input
>     ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses", 
> ![[ACCESS_GROUP_INNER]]}
>              ^
>     <stdin>:45:1: note: scanning from here
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>     <stdin>:45:1: note: with "ACCESSES_INNER" equal to "7"
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>     <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to "4"
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>
> The line FileCheck points to has nothing to do with the changed order 
> of access groups.
> (what's the point of annotating the `!7 = ...` line repeatedly? Why 
> not pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have 
> their values from?)
>
> The CHECK lines, annotated with their line numbers from 
> parallel-loop-md-merge.ll, are:
>
>     68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group 
> ![[ACCESS_GROUP_LIST_3:[0-9]+]]
>     69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
>     70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
>     71
>     72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = 
> !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
>     73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
>     74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
>     75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], 
> ![[ACCESSES_INNER:[0-9]+]]}
>     76 ; CHECK: ![[ACCESSES_INNER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
>     77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], 
> ![[ACCESSES_OUTER:[0-9]+]]}
>     78 ; CHECK: ![[ACCESSES_OUTER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
>
> And the output of `FileCheck --dump-input -v` is:
>
>          38: !0 = !{!1}
>          39: !1 = distinct !{!1, !2, !"callee: %A"}
>          40: !2 = distinct !{!2, !"callee"}
>          41: !3 = !{!4, !5}
> check:72     ^~~~~~~~~~~~~~
>          42: !4 = distinct !{}
> check:73     ^~~~~~~~~~~~~~~~~
>          43: !5 = distinct !{}
> check:74     ^~~~~~~~~~~~~~~~~
>          44: !6 = distinct !{!6, !7}
> check:75     ^~~~~~~~~~~~~~~~~~~~~~~
>          45: !7 = !{!"llvm.loop.parallel_accesses", !5}
> check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no 
> match found
>          46: !8 = distinct !{!8, !9}
> check:76     ~~~~~~~~~~~~~~~~~~~~~~~
>          47: !9 = !{!"llvm.loop.parallel_accesses", !4}
> check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> To fix this regression test, one needs to solve the following questions:
>
>   *  Is `!0 = !{!1}` supposed to match with anything?
>   *  Which of the CHECK lines are the ones that are important for the
>     regression tests?
>   *  The `![[SOMETHING]] = distinct !{}` line appears twice. Which one
>     is ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
>   *  There are two lines with `llvm.loop.parallel_accesses` as well.
>     The author of this regression test was kind enough to give the
>     placeholders descriptive names, but do they match with what
>     FileCheck matched?
>   *  LOOP_INNER and LOOP_OUTER are only distinguished by position and
>     `.i`. Why does the inner loop come first?
>   *  Before the patch that modifies the order, the output of
>     --dump-input=always was:
>
>      38: !0 = !{!1}
>      39: !1 = distinct !{!1, !2, !"callee: %A"}
>      40: !2 = distinct !{!2, !"callee"}
>      41: !3 = !{!4, !5}
>      42: !4 = distinct !{}
>      43: !5 = distinct !{}
>      44: !6 = distinct !{!6, !7}
>      45: !7 = !{!"llvm.loop.parallel_accesses", !4}
>      46: !8 = distinct !{!8, !9}
>      47: !9 = !{!"llvm.loop.parallel_accesses", !5}
>
> This seems to suggest that our change caused !4 and !5 in lines 45 and 
> 47 to swap. This is not what the patch did, which changes to order of 
> two MDNode arguments. The only MDNode that has two other MDNodes as 
> operands is line (line !6 and !8 are representing loops as they 
> reference themselves as first arguments; noalias scopes do this as 
> well, but there is no noalias metadata in this example).
>
> An hasty fix is to change CHECK lines 76 and 77 to
>
>     76 ; CHECK: ![[ACCESSES_INNER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
>     78 ; CHECK: ![[ACCESSES_OUTER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
>
> However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and 
> ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.
>
>
> The reason is that MDNodes are emitted in topological order of a 
> depth-first search. The patch changed the order of ACCESS_GROUP_INNER 
> and ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, 
> the first argument of !3 is emitted before the second argument, 
> ACCESS_GROUP_OUTER is on line 42 and ACCESS_GROUP_INNER on line 43 
> instead.
>
> Even though this regression test uses proper regex-ing of metadata 
> node numbers, it is still fragile as it fails on irrelevant changes. 
> Once it fails, it requires time to fix properly because it is 
> non-obvious which lines are intended to match which output lines. The 
> hasty fix would have mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER 
> and I wouldn't expect someone who is working on the inliner to take 
> the time to understand all the loop metadata. Set-semantics of 
> metadata occurs often, such as noalias metadata, loop properties, tbaa 
> metadata, etc.
>
> The difficulty gets amplified when there are multiple functions in the 
> regression tests; metadata of all functions all appended to the end 
> and MDNodes with same content uniqued, making it impossible to 
> associate particular MDNodes with specific functions.
>
>
> Ideally the regression test would be robust and understandable, 
> achievable with two asserts in a unittest:
>
>     Loop &OuterLoop = *LI->begin();
>     ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
>     Loop &InnerLoop = *OuterLoop.begin();
>     ASSERT_TRUE(InnerLoop.isAnnotatedParallel());


I definitely agree that we should not be trying to do this kind of 
checking using textual metadata-node matching in FileCheck. The 
alternative already available is to add an analysis pass with some kind 
of verifier output. This output, not the raw metadata itself, can be 
checked by FileCheck. We also need to check the verification code, but 
at least that's something we can keep just in one place. For parallel 
annotations, we already have such a thing (we can run opt -loops 
-analyze; e.g., in 
test/Analysis/LoopInfo/annotated-parallel-complex.ll). We also do this 
kind of thing for the cost model (by running with -cost-model -analyze). 
To what extent would making more-extensive use of this technique address 
the use cases you're trying to address?

Maybe we should, also, just make more-extensive use of unit tests?

Thanks again,

Hal



>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/09eb3530/attachment.html>


More information about the llvm-dev mailing list