[llvm-dev] RFC: Are auto-generated assertions a good practice?

Fri May 4 10:53:16 PDT 2018

On Fri, May 4, 2018 at 11:30 AM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <spatel at rotateright.com>
> wrote:
>
>> I understand the overfit argument (but in most cases it just shows that a
>> unit test isn't minimized)...
>>
>
> Even minimized tests sometimes need a few other things to setup the
> circumstance (many DWARF tests, for example - produce the full DWARF
> output, but maybe you only care about one part of it (maybe you care about
> the encoding of instruction ranges, but not the way names are rendered -
> but there's no option to produce only ranges without names (wouldn't really
> make sense)).
>
>
>> but I don't see how the complete auto-generated assertions could be worse
>> at detecting a miscompile than incomplete manually-generated assertions?
>>
>
> Generally overfitting wouldn't result in being bad at detecting failures,
> but excess false positives - if things in the output unrelated to the issue
> change (in intentional/benign ways) & cause the tests to fail often &
> become a burden for the project.
>
> Not suggesting that's the case with these particular instances - but it's
> a tradeoff/concern to keep in mind.
>

Yes, that's definitely something we've acknowledged for x86 codegen tests.
If someone wants to make a big isel, scheduler, or RA change, it's going to
result in a *lot* of regression test diffs. On the plus side, it forces the
author (I've definitely been there) to consider a lot of patterns that they
probably didn't think about it initially.

Note that for IR assertions, the scripts reg-ex-ify value names:

define i1 @trueval_is_true(i1 %C, i1 %X) {
; CHECK-LABEL: @trueval_is_true(
; CHECK-NEXT:    [[R:%.*]] = or i1 [[C:%.*]], [[X:%.*]]
; CHECK-NEXT:    ret i1 [[R]]
;
  %R = select i1 %C, i1 true, i1 %X
  ret i1 %R
}

So this actually insulates the tests from cosmetic changes in value naming.
That's something that's frequently overlooked in hand-written tests because
it takes a lot of effort to type all that out. :)

> - Dave
>
>
>> The whole point of auto-generating complete checks is to catch
>> miscompiles/regressions sooner. Ie, before they get committed and result in
>> bug reports. I don't know how to mine the bugzilla stats on that, but it
>> feels like we have made good progress on that front with better regression
>> tests. Some history of the motivation and scripts here:
>> https://bugs.llvm.org/show_bug.cgi?id=22897
>>
>>
>>
>> On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Yep - all about balance.
>>>
>>> The main risk are tests that overfit (golden files being the worst case
>>> - checking that the entire output matches /exactly/ - this is what
>>> FileCheck is intended to help avoid) and maintainability. In the case of
>>> the autogenerated FileCheck lines I've seen so far - they seem like they
>>> still walk a fairly good line of checking exactly what's intended. Though I
>>> sometimes wonder if they're checking full combinatorial expansions that may
>>> not be required/desirable - always a tradeoff of just how black/white box
>>> tests are.
>>>
>>> On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> > Hello llvm-dev,
>>>> >
>>>> >
>>>> > On a recent code review I was asked to auto-generate assertion checks
>>>> for my
>>>> > unit test. I wasn't aware that this was even possible. I am referring
>>>> to the
>>>> > python `update` scripts under `utils` directory. My first reaction
>>>> was wow!
>>>> > I found it very practical and useful. It saves you significant amount
>>>> of
>>>> > time when writing a regression test. So I gave it a try. The generated
>>>> > checks were satisfying enough, almost exactly what I wanted. Then I
>>>> got a
>>>> > bit sceptical about them. I am worried that auto-generating tests
>>>> based on
>>>> > the compiler output can be quite dangerous. The tests will always pass
>>>> > regardless of whether the compiler emits right or wrong code,
>>>> therefore you
>>>> > have to be certain that they impose the desired compiler behaviour. I
>>>> guess
>>>> > the question here is how often we should be using those scripts.
>>>>
>>>> Like many test-related issues, it comes down to personal judgement. It
>>>> is of course easy to create test/CodeGen/*/* tests that pass
>>>> regardless of whether the compiler breaks broken code regardless of
>>>> whether the test CHECK lines are generated by
>>>> update_llc_test.checks.py or not.
>>>>
>>>> I find it very helpful to have auto-generated CHECK lines that pick up
>>>> any codegen change, but this can of course be problematic for very
>>>> large test cases that are likely to see churn due to scheduling or
>>>> regallloc changes. Being able to regenerate the CHECK lines and view
>>>> the diff is also incredibly helpful when rebasing or moving a patch
>>>> between different branches.
>>>>
>>>> My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
>>>> wherever possible, except in cases where there are so many CHECK lines
>>>> on the output that they obscure the property being tested, indicating
>>>> that a more limited hand-crafted pattern would be superior.
>>>>
>>>> Best,
>>>>
>>>> Alex
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180504/357031a1/attachment-0001.html>