[llvm-dev] [RFC] Compiled regression tests.

Michael Kruse via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 30 22:40:10 PDT 2020


To illustrate some difficulties with FileCheck, lets make a non-semantic
change in LLVM:

    --- a/llvm/lib/Analysis/VectorUtils.cpp
    +++ b/llvm/lib/Analysis/VectorUtils.cpp
    @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1,
MDNode *AccGroups2) {
         return AccGroups1;

       SmallSetVector<Metadata *, 4> Union;
    -  addToAccessGroupList(Union, AccGroups1);
       addToAccessGroupList(Union, AccGroups2);
    +  addToAccessGroupList(Union, AccGroups1);

       if (Union.size() == 0)
         return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g.
memory access) does not limit the parallelization of a loop, added by e.g.
#pragma clang loop vectorize(assume_safety). Therefore a loop can be
assumed to be parallelizable without further dependency analysis if all its
side-effect instructions are marked this way (and has no loop-carried
scalar dependencies).
An access group represents the instructions marked for a specific loop. A
single instruction can be parallel with regrads to multiple loop, i.e.
belong to multiple access groups. The order of the access groups is
insignificant, i.e. whether either `AccGroups1` or `AccGroups2` comes
first, the instruction is marked parallel for both loops.

Even the change should have no effect on correctness, `ninja check-llvm`
fails:

    test/Transforms/Inline/parallel-loop-md-merge.ll

     string not found in input
    ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
             ^
    <stdin>:45:1: note: scanning from here
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESSES_INNER" equal to "7"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to "4"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^

The line FileCheck points to has nothing to do with the changed order of
access groups.
(what's the point of annotating the `!7 = ...` line repeatedly? Why not
pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their
values from?)

The CHECK lines, annotated with their line numbers from
parallel-loop-md-merge.ll, are:

    68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group
![[ACCESS_GROUP_LIST_3:[0-9]+]]
    69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
    70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
    71
    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] =
!{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
    75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]],
![[ACCESSES_INNER:[0-9]+]]}
    76 ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
    77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]],
![[ACCESSES_OUTER:[0-9]+]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_OUTER]]}

And the output of `FileCheck --dump-input -v` is:

         38: !0 = !{!1}
         39: !1 = distinct !{!1, !2, !"callee: %A"}
         40: !2 = distinct !{!2, !"callee"}
         41: !3 = !{!4, !5}
check:72     ^~~~~~~~~~~~~~
         42: !4 = distinct !{}
check:73     ^~~~~~~~~~~~~~~~~
         43: !5 = distinct !{}
check:74     ^~~~~~~~~~~~~~~~~
         44: !6 = distinct !{!6, !7}
check:75     ^~~~~~~~~~~~~~~~~~~~~~~
         45: !7 = !{!"llvm.loop.parallel_accesses", !5}
check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match
found
         46: !8 = distinct !{!8, !9}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~
         47: !9 = !{!"llvm.loop.parallel_accesses", !4}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

   -  Is `!0 = !{!1}` supposed to match with anything?
   -  Which of the CHECK lines are the ones that are important for the
   regression tests?
   -  The `![[SOMETHING]] = distinct !{}` line appears twice. Which one is
   ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
   -  There are two lines with `llvm.loop.parallel_accesses` as well. The
   author of this regression test was kind enough to give the placeholders
   descriptive names, but do they match with what FileCheck matched?
   -  LOOP_INNER and LOOP_OUTER are only distinguished by position and
   `.i`. Why does the inner loop come first?
   -  Before the patch that modifies the order, the output of
   --dump-input=always was:

     38: !0 = !{!1}
     39: !1 = distinct !{!1, !2, !"callee: %A"}
     40: !2 = distinct !{!2, !"callee"}
     41: !3 = !{!4, !5}
     42: !4 = distinct !{}
     43: !5 = distinct !{}
     44: !6 = distinct !{!6, !7}
     45: !7 = !{!"llvm.loop.parallel_accesses", !4}
     46: !8 = distinct !{!8, !9}
     47: !9 = !{!"llvm.loop.parallel_accesses", !5}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47
to swap. This is not what the patch did, which changes to order of two
MDNode arguments. The only MDNode that has two other MDNodes as operands is
line (line !6 and !8 are representing loops as they reference themselves as
first arguments; noalias scopes do this as well, but there is no noalias
metadata in this example).

An hasty fix is to change CHECK lines 76 and 77 to

    76 ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_OUTER]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and
ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.


The reason is that MDNodes are emitted in topological order of a
depth-first search. The patch changed the order of ACCESS_GROUP_INNER and
ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, the first
argument of !3 is emitted before the second argument, ACCESS_GROUP_OUTER is
on line 42 and ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node
numbers, it is still fragile as it fails on irrelevant changes. Once it
fails, it requires time to fix properly because it is non-obvious which
lines are intended to match which output lines. The hasty fix would have
mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn't expect
someone who is working on the inliner to take the time to understand all
the loop metadata. Set-semantics of metadata occurs often, such as noalias
metadata, loop properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the
regression tests; metadata of all functions all appended to the end and
MDNodes with same content uniqued, making it impossible to associate
particular MDNodes with specific functions.


Ideally the regression test would be robust and understandable, achievable
with two asserts in a unittest:

    Loop &OuterLoop = *LI->begin();
    ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
    Loop &InnerLoop = *OuterLoop.begin();
    ASSERT_TRUE(InnerLoop.isAnnotatedParallel());
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/0aae6e4f/attachment.html>


More information about the llvm-dev mailing list