[llvm-dev] RFC: Are auto-generated assertions a good practice?

Fri May 4 10:57:08 PDT 2018

On Fri, May 4, 2018 at 10:53 AM Sanjay Patel <spatel at rotateright.com> wrote:

> On Fri, May 4, 2018 at 11:30 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <spatel at rotateright.com>
>> wrote:
>>
>>> I understand the overfit argument (but in most cases it just shows that
>>> a unit test isn't minimized)...
>>>
>>
>> Even minimized tests sometimes need a few other things to setup the
>> circumstance (many DWARF tests, for example - produce the full DWARF
>> output, but maybe you only care about one part of it (maybe you care about
>> the encoding of instruction ranges, but not the way names are rendered -
>> but there's no option to produce only ranges without names (wouldn't really
>> make sense)).
>>
>>
>>> but I don't see how the complete auto-generated assertions could be
>>> worse at detecting a miscompile than incomplete manually-generated
>>> assertions?
>>>
>>
>> Generally overfitting wouldn't result in being bad at detecting failures,
>> but excess false positives - if things in the output unrelated to the issue
>> change (in intentional/benign ways) & cause the tests to fail often &
>> become a burden for the project.
>>
>> Not suggesting that's the case with these particular instances - but it's
>> a tradeoff/concern to keep in mind.
>>
>
> Yes, that's definitely something we've acknowledged for x86 codegen tests.
> If someone wants to make a big isel, scheduler, or RA change, it's going to
> result in a *lot* of regression test diffs. On the plus side, it forces the
> author (I've definitely been there) to consider a lot of patterns that they
> probably didn't think about it initially.
>

*nod* Though if we had good ways of isolating tests further to avoid this
churn, I think we would do so - even at the loss of that consideration.

>  Note that for IR assertions, the scripts reg-ex-ify value names:
>

*nod* that's one example of the sort of good thing that avoids tests being
noisier than they need to be, etc.

>
> define i1 @trueval_is_true(i1 %C, i1 %X) {
> ; CHECK-LABEL: @trueval_is_true(
> ; CHECK-NEXT:    [[R:%.*]] = or i1 [[C:%.*]], [[X:%.*]]
> ; CHECK-NEXT:    ret i1 [[R]]
> ;
>   %R = select i1 %C, i1 true, i1 %X
>   ret i1 %R
> }
>
> So this actually insulates the tests from cosmetic changes in value
> naming. That's something that's frequently overlooked in hand-written tests
> because it takes a lot of effort to type all that out. :)
>

That's not been my experience - though it is a bit laborious, most/all of
the hand written tests I see these days tend to do a good job of using
matches for value names rather than hardcoding them. But I don't write/work
with optimization tests so much - maybe they're often small enough that the
names are rarely perturbed & so hardcoding can be reasonable. In debug info
and clang tests it's pretty necessary because there's lots of moving parts.

In any case I'm not suggesting the autogenerated checks are bad/broken/need
to be fixed - just that there are tradeoffs to keep in mind when working
with these things.

- Dave

>
>
>
>
>> - Dave
>>
>>
>>> The whole point of auto-generating complete checks is to catch
>>> miscompiles/regressions sooner. Ie, before they get committed and result in
>>> bug reports. I don't know how to mine the bugzilla stats on that, but it
>>> feels like we have made good progress on that front with better regression
>>> tests. Some history of the motivation and scripts here:
>>> https://bugs.llvm.org/show_bug.cgi?id=22897
>>>
>>>
>>>
>>> On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Yep - all about balance.
>>>>
>>>> The main risk are tests that overfit (golden files being the worst case
>>>> - checking that the entire output matches /exactly/ - this is what
>>>> FileCheck is intended to help avoid) and maintainability. In the case of
>>>> the autogenerated FileCheck lines I've seen so far - they seem like they
>>>> still walk a fairly good line of checking exactly what's intended. Though I
>>>> sometimes wonder if they're checking full combinatorial expansions that may
>>>> not be required/desirable - always a tradeoff of just how black/white box
>>>> tests are.
>>>>
>>>> On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> > Hello llvm-dev,
>>>>> >
>>>>> >
>>>>> > On a recent code review I was asked to auto-generate assertion
>>>>> checks for my
>>>>> > unit test. I wasn't aware that this was even possible. I am
>>>>> referring to the
>>>>> > python `update` scripts under `utils` directory. My first reaction
>>>>> was wow!
>>>>> > I found it very practical and useful. It saves you significant
>>>>> amount of
>>>>> > time when writing a regression test. So I gave it a try. The
>>>>> generated
>>>>> > checks were satisfying enough, almost exactly what I wanted. Then I
>>>>> got a
>>>>> > bit sceptical about them. I am worried that auto-generating tests
>>>>> based on
>>>>> > the compiler output can be quite dangerous. The tests will always
>>>>> pass
>>>>> > regardless of whether the compiler emits right or wrong code,
>>>>> therefore you
>>>>> > have to be certain that they impose the desired compiler behaviour.
>>>>> I guess
>>>>> > the question here is how often we should be using those scripts.
>>>>>
>>>>> Like many test-related issues, it comes down to personal judgement. It
>>>>> is of course easy to create test/CodeGen/*/* tests that pass
>>>>> regardless of whether the compiler breaks broken code regardless of
>>>>> whether the test CHECK lines are generated by
>>>>> update_llc_test.checks.py or not.
>>>>>
>>>>> I find it very helpful to have auto-generated CHECK lines that pick up
>>>>> any codegen change, but this can of course be problematic for very
>>>>> large test cases that are likely to see churn due to scheduling or
>>>>> regallloc changes. Being able to regenerate the CHECK lines and view
>>>>> the diff is also incredibly helpful when rebasing or moving a patch
>>>>> between different branches.
>>>>>
>>>>> My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
>>>>> wherever possible, except in cases where there are so many CHECK lines
>>>>> on the output that they obscure the property being tested, indicating
>>>>> that a more limited hand-crafted pattern would be superior.
>>>>>
>>>>> Best,
>>>>>
>>>>> Alex
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180504/c48258c4/attachment.html>