[llvm-dev] [RFC] Compiled regression tests.

Tue Jun 23 18:33:11 PDT 2020

Hello LLVM community,

For testing IR passes, LLVM currently has two kinds of tests:
 1. regression tests (in llvm/test); .ll files invoking opt, and
matching its text output using FileCheck.
 2. unittests (in llvm/unittests); Google tests containing the IR as a
string, constructing a pass pipeline, and inspecting the output using
code.

I propose to add an additional kind of test, which I call "compiled
regression test", combining the advantages of the two. A test is a
single .cxx file of the general structure below that can be dumped
into the llvm/test directory. I am not proposing to replace FileCheck,
but in a lot of cases, domain-specific verifiers can be more powerful
(e.g. verify-uselistorder or `clang -verify`).

    #ifdef IR
      define void @func() {
      entry:
        ret
      }
    #else /* IR */
      #include "compiledtestboilerplate.h"
      TEST(TestSuiteName, TestName) {
        unique_ptr<Module> Output = run_opt(__FILE__, "IR",
"-passes=loop-vectorize");
        /* Check Output */
      }
    #endif /* IR */

That is, input IR and check code are in the same file. The run_opt
command is a replica of main() from the opt tool, so any command line
arguments (passes with legacy or new passmanager, cl::opt options,
etc.) can be passed. It also makes converting existing tests simpler.

The top-level structure is C++ (i.e. the LLVM-IR is removed by the
preprocessor) and compiled with cmake. This allows a
compile_commands.json to be created such that refactoring tools,
clang-tidy, and clang-format can easily be applied on the code. The
second argument to run_opt is the preprocessor directive for the IR
such that multiple IR modules can be embedded into the file.

Such tests can be compiled in two modes: Either within the LLVM
project, or as an external subproject using llvm_ExternalProject_Add.
The former has the disadvantage that new .cxx files dumped into the
test folder are not recognized until the next cmake run, unless the
CONFIGURE_DEPENDS option is used. I found this adds seconds to each
invocation of ninja which I considered a dealbreaker. The external
project searched for tests every time, but is only invoked in a
check-llvm run, no different than llvm-lit. It uses CMake's
find_package to build against the main project's results (which
currently we do not have tests for) and could also be compiled in
debug mode while LLVM itself is compiled in release mode.

The checks themselves can be any of gtest's ASSERT/EXPECT macros, but
for common test idioms I suggest to add custom macros, such as

    ASSERT_ALL_OF(InstList, !isa<VectorType>(I->getType()));

which on failure prints the instruction that does not return a vector.
Try that with FileCheck. PattenMatch.h from InstCombine can be used as
well. Structural comparison with a reference output could also be
possible (like clang-diff,
[llvm-canon](http://llvm.org/devmtg/2019-10/talk-abstracts.html#tech12),
https://reviews.llvm.org/D80916).

Some additional tooling could be helpful:

 * A test file creator, taking some IR, wrapping it into the above
structure, and write it into the test directory.
 * A tool for extracting and updating (running opt) the IR inside the
#ifdef, if not even add this functionality to opt itself. This is the
main reason to not just the IR inside a string.

A Work-in-Progress differential and what it improves over FileCheck
and unittests is available here: https://reviews.llvm.org/D82426

Any kind of feedback welcome.