[LLVMdev] Adding ClamAV to the llvm testsuite (long)

Mon Dec 17 17:43:31 PST 2007

On Mon, 17 Dec 2007, Evan Cheng wrote:
> We always welcome more tests. But it looks like there are two issues
> here.
>
> 1. The autoconf requirement. Is it possible to get one configuration
> working without the need for autoconf?

One way to do this is to add a "cut down" version of the app to the test 
suite.

> 2. GPL license. Chris?

Any open source license that allows unrestricted redistribution is fine in 
llvm-test

-Chris

> Evan
>
> On Dec 14, 2007, at 12:30 PM, Török Edwin wrote:
>
>> Hi,
>>
>> I see that you are looking for new programs for the testsuite, as
>> described in 'Compile programs with the LLVM compiler', and 'Adding
>> programs to the llvm testsuite" on llvm.org/OpenProjects.
>>
>> My favourite "C source code" is ClamAV (www.clamav.net), and I would
>> like to get it included in the testsuite.
>>
>> This mail is kind of long, but please bear with me, as I want to
>> clarify
>> how to best integrate Clamav into LLVM-testsuite's buildsystem.
>>
>> Why include it?
>>
>> It can be useful to find regressions, or new bugs; it already
>> uncovered
>> a few bugs in llvm's cbe, llc, and optimizers that I've reported
>> through
>> bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV
>> was
>> also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)]
>>
>> It can be useful to test new/existing optimizations. There aren't any
>> significant differences on its performance when compiled by different
>> compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in
>> the
>> future) make it faster ;)
>>
>> I had a quick look at the build infrastructure, and there are some
>> issues with getting it to work with programs that use autoconf (such
>> as
>> ClamAV), since AFAICT testsuites aren't allowed to run configure
>> (listed
>> below)
>>
>> Building issues aside there are some more questions:
>> * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite?
>> * what version to use? Latest stable, or latest svn?
>> [In any case I'll wait till the next stable is published, it should be
>> happening *very soon*]
>> * what happens if you find bugs that also cause it to fail under gcc
>> (unlikely) ? [I would prefer to get an entry on clamav's bugzilla
>> then,
>> with something in its subject saying it is llvm-testsuite related]
>> * what happens if it only fails under llvm-gcc/llc/clang,.. and it is
>> not due to a bug in llvm, but because of portability issues in the
>> source code (unlikely)?
>> I would prefer a clamav bugzilla here too, clamav is meant to be
>> "portable" :)
>>
>> Also after I have set it up in the llvm testsuite, is there an easy
>> way
>> to run clang on it? Currently I have to hack autoconf generated
>> makefiles if I want to test clang on it.
>>
>> 1. I've manually run, and generated a clamav-config.h.
>> This usually just contains HAVE_* macros for headers, which should all
>> be available on a POSIX system, so it shouldn't be a problem from this
>> perspective for llvm's build farm.
>> However there are some target specific macros:
>> #define C_LINUX 1
>> #define FPU_WORDS_BIGENDIAN 0
>> #define WORDS_BIGENDIAN 0
>> Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system
>> doesn't have a proper <stdint.h>
>> Also not sure of this:
>> /* ctime_r takes 2 arguments */
>> #define HAVE_CTIME_R_2 1
>>
>> What OS and CPU do the machines on llvm's buildfarm have? We could
>> try a
>> config.h that works on Linux (or MacOSX), and try to apply
>> that to all, though there might be (non-obvious) failures.
>>
>> Any solutions to having these macros defined in the LLVM testsuite
>> build? (especially for the bigendian macro)
>>
>> 2. AFAICT the llvm-testsuite build doesn't support a program that is
>> built from multiple subdirectories.
>> libclamav has its source split into multiple subdirectories, gathering
>> those into one also requires changing #include that have relative
>> paths.
>> I also get files with the same name but from different subdirs, so I
>> have to rename them to subdir_filename, and do that in #include too.
>>
>> I have done this manually, and it works (native, llc, cbe work).
>> I could hack together some perl script to do this automatically, or is
>> there a better solution?
>>
>> 3. Comparing output: I've written a small script that compares the
>> --debug output, because it needs some adjustments since I also get
>> memory addresses in the --debug output that obviously don't match up
>> between runs.
>> There isn't anything else to compare besides --debug output (besides
>> ClamAV saying no virus found), and that can be a fairly good test.
>>
>> 4. What is the input data?
>> Clamav is fast :)
>> It needs a lot of input data if you want to get reasonable timings out
>> of it (tens, hundreds of MB).
>> Scanning multiple small files will be I/O bound, and it'd be mostly
>> useless as a benchmark (though still useful for testing
>> compiler/optimization correctness).
>>
>> So I was thinking of using some large files already available in the
>> testsuite (oggenc has one), and then maybe point it to scan the last
>> *stable* build of LLVM. Or find some files that are scanned slowly,
>> but
>> that don't presume lots of disk I/O (an archive, with ratio/size
>> limits
>> disabled, with highly compressable data).
>> You won't be able to benchmark clamav in a "real world" scenario
>> though,
>> since that'd involve making it scanning malware, and I'm sure you
>> don't
>> want that on your build farm.
>>
>> You could give it to scan random data, but you'll need it to be
>> reproducible, so scanning /dev/random, or /bin of current LLVM tree is
>> not a good choice ;)
>>
>> There's also the problem of eliminating the initial disk I/O time
>> out of
>> the benchmark, like rerun 3 times automatically or something like
>> that?
>>
>> 5. Library dependencies
>> It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I
>> can reasonably assume zlib is available on all systems where the
>> testsuite is run.
>>
>> 6. Sample output on using 126Mb of data as input:
>>
>> $ make TEST=nightly report
>> ....
>> Program  | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen |
>> GCC     CBE     LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA
>> LLC/LLC-BETA
>> clamscan | 7.0729 2074308  *           *                *           |
>> 17.48   17.55   18.81 *        *   | 1.00    0.93    n/a          n/a
>>
>> 7. Clamav is multithreaded
>> If you're interested in testing if llvm-generated code works when
>> multithreaded (I don't see why it wouldn't, but we're talking about a
>> testsuite), you'd need to start the daemon (as an unprivileged user is
>> just fine), and then connect to it.
>> Is it possible to tell the testsuite build system to do this?
>>
>> 8. Code coverage
>> Testing all of clamav code with llvm is ... problematic. Unless you
>> create files with every packer/archiver known to clamav it is likely
>> there will be files that are compiled in but never used during the
>> testsuite run. You can still test that these files compile, but
>> thats it.
>>
>> 9. Configure tests
>> Configure has 3 tests that check for gcc bugs known to break ClamAV (2
>> of which you already have, since those are in gcc's testsuite too).
>> Add
>> as separate "programs" to run in llvm testsuite?
>>
>> Thoughts?
>>
>> Best regards,
>> Edwin
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/