[LLVMdev] Validating LLVM

Mon Nov 10 12:59:04 PST 2008

Back during the LLVM developer's meeting, I talked with some of you about a
proposal to "validate" llvm.  Now that 2.4 is almost out the door, it seems a
good time to start that discussion.

I've written up a detailed proposal and attached it to this message.  The goal
is to ease LLVM use by third parties.  We've got consideral experience with 
LLVM and the community development model here and see its benefits as well as 
challenges.  This proposal attempts to address what I feel is the main 
challenge: testing and stability of LLVM between releases.

Please take a look and send feedback toi the list.  I'd like to get the 
process moving early in the 2.4 cycle.

Thanks for your input and support.

                                         -Dave
-------------- next part --------------
LLVM Validation Proposal
------------------------

*Motivation*

LLVM Top of Trunk (ToT) is fairly unstable.  It is common for tests to
break on a daily basis.  This makes tracking the LLVM trunk difficult
and often undesireable for those needing stability.  Such needs come
from a variety of situations: integrating LLVM with other components
requires a stable base to isolate bugs; researchers want a stable
platform on which to run experiments; developers want to know when
they've broken something and don't want the testing noise random
breakage introduces; some users keep private LLVM repositories where
they do project-specific development and want to know when it is "safe"
to merge from upstream so as to introduce as few new bugs as possible.

Often those in the situations described above can't limit themselves to
the latest stable release.  LLVM gains important new features rapidly
and users want to stay up-to-date, both to access those features and to
stay current so as to make merges from upstream simpler.  Six months
between releases is a long time to wait and patches from release to
release are extremely large and likely to conflict with local changes.

*Solution*

One way to meet the needs identified above is to regularly "validate"
LLVM as passing all of its tests.  A "validation run" is a testing run
of LLVM at some revision.  A "validation result" is the outcome of the
testing run (pass/fail, etc.).  A "validation" is a validation run that
meets the stability requirements (e.g. passing all tests).

Validations can be expressed as tags of trunk and users can check out
the branch or svn switch over to it as desired.

Validations should be kept around to maintain history.  For example,
users may not want to update to the latest-and-greatest validated LLVM
but want a safe spot to advance further than they are at currently.
Keeping all validation tags allows this.  Since svn tags are cheap, this
should not impose a repository burden.

*Implementation*

The biggest issue to deal with is testing.  LLVM testing resources are
already thin and validation requires that the testing process be
somewhat more formalized.

Some user interested in a particular target (say, x86 or PPC) should
claim responsibility for validating LLVM on that target.  This would
involve running all of the LLVM target-independent tests as well as the
tests for the specific target desired.  The identified tester for each
target will perform the validation by tagging the revision tested if
validation is successful..

This setup allows testing to be distributed among those most interested
in validations.  It also relaxes the validation requirements some by
allowing LLVM to be validated against one target even though tests of
another target may fail.  This also relieves the tester from having to
provide working platforms on which to run all of the LLVM
target-specific tests.

Those doing validations are collectively referred to as "validators."

All validation results should be announced on llvmdev and
llvm-testresults.  It is also useful to announce validation run failures
on these lists to keep ther community informed of its progress.  A list
of failing tests would be most helpful in such messages.

In addition to the independent target validators, there should be at
least one user responsible for validating all of LLVM (a "comprehensive
validation"), when every single LLVM test (including all target-specific
tests) passes.  This is a much higher benchmark and such a validation
provides more confidence to end users about the stability of the
validation.  This is also a much higher testing burden so users
undertaking such validations should have access to all of the necessary
machine resources, including hardware for all supported target
platforms.  It is not clear if any LLVM user currently possesses this
capacity.

Alternatively, we could schedule per-target validation runs such that
they occur on the same revision (e.g. every 200 commits).  If all
targets pass validation then we can consider LLVM comprehensively
validated.  This would eliminate the need for one "super-validator" with
access to all the necessary computing resources.  These regular
validation runs will often fail.  Such failures should be noted in
messages to llvmdev and llvm-testresults.

For the purposes of this proposal, the second scheme ("distributed
comprehensive validation") should be preferred, as it seems more
practical and less resource-intensive on any one individual or
organization.

The various targets as well as the comprehensive validation are
collectively referred to as "validation targets."  There should be one
official validator for each validation target, though multiple users may
contribute testing for a particular validation target.  It is up to the
validator to decide which testing runs are suitable for validation.

The LLVM community must agree on a set of tests that constitutes a
validation.  This proposal suggests all publicly available LLVM tests
(for each validation target) should be required to pass.  This includes:

* Tests in llvm/tests
* Tests in llvm-test, excluding external tests such as SPEC

If the LLVM community can identify validators with access to the
external tests, those should be included as well.

Note that this testing regimen requires validators to build and test
llvm-gcc as well as the native LLVM tools.

The tags themselves should live under the tags tree.  One possible
tagging scheme looks like this:

trunk
tags
  ...
  RELEASE_21
  RELEASE_22
  RELEASE_23
  validated
    ALL
      development_24
        r54582
    X86
      development_24
        r53215
        r54100
        r54582
    X86-64 (? maybe covered by x86)
      ...
    PowerPC
      ...
    Alpha
      ...
    Sparc
      ...
    Mips
      ...
    ARM
      ...
    CellSPU
      ...
    IA64
      ...
branches
  ...
  release_21
  release_22
  release_23

This scheme organizes validations by release cycle so users can more
quickly find what they're looking for.  Note that a validated revision
could appear under more than one validation target.

Validation tags are read-only.  Thus they live under the tags directory.
The RO nature of these tags could and probably should be enforced using
svn hooks.  This would require help from the LLVM repository
maintainers.  It is not a requirement for this proposal to move forward.

Some testing infrastructure enhancements can make validation easier
and/or more precise.  For example, all X86 and X86-64 tests are lumped
under X86.  But some users are not interested in 32-bit X86.  Separating
the running of 32-bit and 64-bit X86 tests allows validation to be
distributed to a greater extent.  Again, such enhancements are not
required for this proposal to move forward.

Validations should occur at least weekly.  As a release approaches,
validations should occur more frequently, perhaps twice a week.  This
will ensure that the ToT remains stable.  More frequent validations
should begin one month before release.  If the "validation run every N
commits" approach to comprehensive validation is taken, validators must
do validation runs according to those requirements as well.  Such
validation runs need not always result in a validation (i.e. the testing
can fail) Validators are free to validate even more often than these
requirements.

*Next Steps*

A validation coordinator should identify a validator for each validation
target and keep the list current as responsibilities change.  Users
should begin signing up to be validators as soon as this or another
proposal is accepted.  It is not necessary to have a validator for each
validation target before beginning validations.

Tools should be developed to ease the validation process.  Such tools
should be set up to run regularly under cron or some similar scheduling
system.  The tools should send validation run results to llvmdev and
llvm-testresults.

*Outstanding Questions*

We need to address the following detail questions about the process.
Some of these will be answered through trial-and-error.

1. When doing a distributed comprehensive validation (scheme 2), how
   often should those tests occur?  The proposal throws out "every 200
   commits" as an example, but is that the right timeframe?

2. Who can be the validation coordinator?  I will offer my services for
   this role at least starting out.

3. Who will be responsible for each validation?  Cray commits to taking
   responsibility for validating x86.

4. Are there testers beyond the official validators that want to
   contribute resources?  Who are they and what's the process of
   plugging them in?

5. Is validating weekly the right frequency?  What about when a release
   approaches?