[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

Tue Nov 10 01:50:12 PST 2015

On 10 November 2015 at 01:49, Daniel Berlin <dberlin at dberlin.org> wrote:
> https://gcc.gnu.org/gcc-4.4/criteria.html
>
> It is a zero regression policy for primary platforms.
>
> Look, I love LLVM as much as the next guy, but in the 15 years i worked on
> GCC, through lots of major and minor releases, i can't remember a single
> release with "thousands" of failures.

Hi Daniel,

This was not meant as a GCC vs LLVM rant. I have no affiliation nor an agenda.

I was merely stating the quality of compiler test suites, and how
valuable it would be to use the GCC tests in LLVM (or vice-versa). I
agree with Paul that the LLVM tests are little more than smoke screen,
and from what I've seen, the GCC tests are just a bigger smoke screen.
I would first try to understand what in the GCC suite is complementary
to ours, and what's redundant, before dumping it in.

I may be wrong, and my experience is largely around Linaro (kernel,
toolchain, android), so it may very well be biased. These are the data
points I have for my statements:

1. GCC trunk is less stable than LLVM because the lack of general buildbots.
 * Testing a new patch means comparing the test results (including
breakages) against the previous commit, and check the differences.
This is a poor definition of "pass", especially when the number of
failures is large.
 * On ARM and AArch64, the number of failures is around a couple of
thousand (don't know the exact figure). AFAIK, these are not marked
XFAIL in any way, but are known to be broken for one reason or
another.
 * The set of failures is different for different sub-architectures
and ARM developers have to know what's good and what's not based on
that. If XFAILs were used more proficiently, they wouldn't have this
problem. I hear some people don't like to XFAIL because they want to
"one day fix the bug", but that's a personal opinion on the validity
of XFAILs.
 * Linaro monthly releases go out with those failures, and the fact
that they keep on going means the FSF releases do, too. This is a huge
cost on the release process, since it needs complicated diff programs
and often incur in manual analysis.
 * Comparing the previous release against the new won't account for
new features/bugs that are introduced, and not all bugs get to
bugzilla. We have the same problem in LLVM, but our developers know
more or less what's being done. Not all of us track every new feature
introduced by GCC, so tracking their new bugs would be a major task.

2. Linux kernel and Android builds with new GCC have increasing trouble.
 * I heard from both kernel and android engineers that every new GCC
release shows more failures than the previous difference on their
code. Ie. GCC 4.8->4.9 had a bigger delta than 4.7->4,8.
 * The LLVMLinux group reported more trouble moving between two GCC
releases than porting to LLVM.
 * Most problems are due to new warnings and errors, but some are bugs
that weren't caught by the regression nor the release processes.

I understand it's impossible to catch all bugs, and that both the
Linux Kernel and Android are large projects, but this demonstrates
that the GCC release process is as good (or bad) as our own, but in a
different mindset (focus on release validation, rather than trunk) and
by a different community (which most of us don't track).

My conclusion is that, if we're ever going to incorporate the GCC
test-suite, it'll take a lot of time fudging it to be a pass/fail, and
for every new version of it, we'll have the same amount of work.
Reiterating Paul's points, I believe those tests to not have
sufficient value to be worth the continuous effort. That means we'll
have to rely on companies to do secondary screening for LLVM,
something that I believe GCC would rather not do, but we seem to be ok
with.

Then again, I may be completely wrong.

cheers,
--renato