[LLVMdev] regression testing

Thu Aug 20 14:23:50 PDT 2009

<moving from commits to llvmdev>

On Aug 20, 2009, at 10:09 AM, Dale Johannesen wrote:
> On Aug 19, 2009, at 8:16 PMPDT, Chris Lattner wrote:
> On Aug 19, 2009, at 5:40 PM, Dale Johannesen wrote:
>>
>>>> If regalloc picks a register other than eax, it's a bug that  
>>>> would be good to catch because that would be increasing number of  
>>>> used registers.
>>>
>>> No, using ecx or edx would not make the code quality any worse.
>>
>> They would clobber an extra register.  If/when we start doing  
>> interprocedural regalloc, that is bad.  Forcing the test to check  
>> for eax does not make the test any more fragile.
>
> Yes it does.  The choice of register has nothing to do with what the  
> test is testing.

Checking the register does not have anything to do with this bug, but  
it is a useful thing for the testsuite to verify.

>
> The test is run twice, and one run is appropriate for FileCheck.  It  
> occurred to me a mix might work, and it seems to, so I've checked  
> that in.
>
> You may be looking at this the wrong way; using FileCheck shouldn't  
> be a goal in itself.
>  If it's adequate to express what you're trying to test for, there  
> are good reasons to use it, but here that's not the case.

I agree that "using FileCheck shouldn't be a goal in itself ", but I  
completely disagree with your assessment of the motivation.  The  
motivation of doing this sort of test refactoring is:

1. Reduce the number of run lines.  There is no reason for this test  
to have two run lines with multiple completely redundant invocations  
of llvm-as and llc.  If there is one llvm-as/llc/filecheck line then  
this just disappears.

2. Reduce the number of test files.  I think that a large number of  
tests should be merged together into fewer but more rich testcases.   
I've started doing this with CodeGen/X86/sse[1234]*.  As LLVM  
continues to grow, we'll continue to get more and more files, and  
grouping them by "ancient bug fixed" is a lot less interesting than  
grouping by functional area.

3. Make the checks more precise.  Even in the test as you checked it  
today, there is no verification that two strings you're searching for  
occur in the right functions.  If you inverted some condition and they  
started coming out wrong, then the test wouldn't notice it.  This is  
something that can have tradeoffs: making tests more specific makes  
them more fragile to unrelated changes.  However, this is just a  
matter of crafting the test right, and FileCheck allows tests to be  
fuzzy also when needed.

Overall, I am strongly motivated to reduce the amount of time to run  
tests.  As llvm continues to grow, the number of things tested will  
obviously continue to grow.  The problem is that this will make it  
harder for people to run tests.  The harder it is for people to run  
tests, the less likely they are to do so.  There are many solutions to  
this:

1. Improve the harness, e.g. to run tests in parallel.
2. Reduce gross redundancy in tests (e.g. running llvm-as and llc  
twice in your example)
3. Merge together tests into larger units that have the same runtime  
conditionals, reducing redundancy across tests.
4. Use more efficient testing methodologies.

Another thing to keep in mind is that we want to make it really easy  
to *write* tests as well.  If it is hard to write tests, people won't  
do it, which is also a big problem.  Forcing people to manually factor  
run lines to avoid invocations of llvm-as (for example) doesn't seem  
worth it.

I think that FileCheck is one important piece that helps address a  
subset of the problems above.  Yes, I do expect it to support regex's  
someday, but that doesn't impact anything in the discussion or in your  
particular test.  Please change the test to use one run line that uses  
FileCheck for both tests.

-Chris