[llvm-dev] distributed lit testing

Fri Mar 12 06:56:07 PST 2021

Hi James,

We run lit tests at Google using a custom runner on a distributed build
system similar to Bazel.
In particular we run most of the llvm-project tests both when pulling in
upstream revisions, and for any change to our internal repository that
touches nearby files.

I wanted to share some of our experiences in case they're useful, and in
the hope that this project may result in something we can use too :-)
I'm being brief here, but happy to provide more details.

Our build system wants to run each test in isolation (separate process,
sandboxed).
Making each test hermetic separates concerns nicely (the same distributed
runner is used for all kinds of testing, not just lit).
This model is also easier to fit into other containers (e.g. I imagine
Ninja could make a good local test driver).
Compared to e.g. a custom driver that talks to a custom worker server that
runs many tests per subprocess... there's not very much of that we would be
able to reuse.
I know there are OSS Bazel projects that want to run lit tests that would
struggle with this model too.

The biggest problem with using the standard lit tool for hermetic tests is
it was too slow to start to run a single test.
Fundamentally the slow parts are the config system, and init of python
programs.

We had a greatly simplified time with the config system, because test
(mostly) in a single config, so we could flatten it out into a list of
features and substitutions.
But in a more general system, if we can produce the config data from config
logic as a *build* step, then it can be cached in the usual way and simply
fed into each test.
You'll need to untangle config specific to the machine running the test
from config specific to the machine driving the tests.

I wrote a hermetic test runner in Go - not my favorite language but it
starts up fast and has good subprocess support.
It's greatly simplifying to be able to assume you can fork a real shell and
only limited state (CWD, exported vars) can leak from one RUN line to the
next, this works fine for us in practice (but we don't test on windows).
It has some nice features like printing a transcript of the test run,
highlighting directives and stderr output, showing pre/post expansion
lines, annotating each line with the result.
I should be able to share the code of this, it's nothing terribly
surprising.
It's less than 1000LOC and runs almost all LLVM tests - IMO it would be
worthwhile to keep the lit spec very simple and removing some of the
marginal features that have crept in over the years. We chose to simply
drop some tests rather than deal with all the corners.
(Before this existed, we ran sed over the lit tests to turn them into shell
scripts, which worked but was hard to maintain and to read the output on
failure... actually the upstream lit runner has the latter problem too!)

I'm sure I've forgotten things, but I think those were my biggest
takeaways. Needing to solve the config problem + the go dependency were the
main reasons I didn't push to make these changes upstream :-(
Hope this is useful or maybe at least interesting :-)

Cheers, Sam

On Wed, Feb 24, 2021 at 9:54 AM James Henderson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Victor,
>
> The lit test framework is the main testing framework used by LLVM. You can
> find the source code for it in the LLVM github repository (see in
> particular https://github.com/llvm/llvm-project/tree/main/llvm/utils/lit),
> and there is documentation available for it on the LLVM website -
> https://llvm.org/docs/TestingGuide.html gives the high-level picture of
> how LLVM is tested, whilst https://llvm.org/docs/CommandGuide/lit.htmlis
> more focused on lit specifically.
>
> Examples of where lit is used include the individual test files located in
> places like llvm/test, clang/test and lld/test within the github tree.
> These test directories include additional configuration files, some of
> which are configured when CMake is used to generate the build files for the
> LLVM project. If you aren't already familiar with LLVM, I highly recommend
> reading up on https://llvm.org/docs/GettingStarted.html, and following
> the steps to make sure you can build and run LLVM components locally.
>
> Lit works as a python process which spawns many child processes, each of
> which runs one or more of the tests located in the directory under test.
> These tests typically are a sequence of commands that use components of
> LLVM that have already been built. You can build the test dependencies and
> run the tests by building one of the CMake-generated targets called check-*
> (where * might be llvm, lld, clang, etc to run a test subset or "check-all"
> to run all known tests. Currently, the tests run in parallel on the user's
> machine, using the python multiprocessing library to do this. There also
> exists the --num-shards and related options which allows multiple computers
> to each run a subset of the tests. I am not too familiar on how this option
> is used in practice, but I believe it requires the computers to all have
> access to some shared filesystem which contains the tests and build
> artifacts, or to each have the same version checked out and to have been
> sent the full set of build artifacts to use. Others on this list might be
> able to clarify further.
>
> The project goal is to provide a framework for distributing these tests
> across multiple computers in a more flexible manner than the existing
> sharding mechanism. I can think of two different high-level options -
> either a layer on top of lit which uses the existing sharding mechanism
> somehow, or something built into the existing lit code that goes wide with
> the tests across the machines. It would be up to you to identify and
> implement a way forward doing this. The hope would be that this framework
> could be used for multiple different distributed systems, as described in
> the original project description on the Open Projects page.
>
> This project is intended to be a possible Google Summer of Code project.
> As such, to participate in it, you'd need to sign up on the GSOC website,
> and provide a project proposal there which details how you plan to solve
> the challenge. It would help your proposal get accepted if you can show
> some understanding of the lit testsuite, and some evidence of contributions
> to LLVM (perhaps in the form of additional testing you might identify that
> is missing in some tests, or by fixing one or more bugs from the LLVM
> bugzilla page, perhaps labelled with the "beginner" keyword). I am happy to
> work with you on your proposal if you are uncertain about anything, but the
> core of the proposal needs to come from you.
>
> I hope that gives you the information you are looking for. Please feel
> free to ask any further questions that you may have.
>
> James
>
> On Tue, 23 Feb 2021 at 17:28, Victor Kukshiev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello I am Victor Kukshiev (cetjs2 in IRC), 2rd course student of PetrSU
>> university.
>> Distributed lit testing idea is interested and possible for me, I think.
>> Could you tell us more about this project?
>> What is lit test suite?
>> I know python  language.
>> What do I participate in thiis project?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/85c13a47/attachment.html>