[LLVMdev] Adding ClamAV to the llvm testsuite (long)

Mon Dec 17 10:52:11 PST 2007

We always welcome more tests. But it looks like there are two issues  
here.

1. The autoconf requirement. Is it possible to get one configuration  
working without the need for autoconf?
2. GPL license. Chris?

Evan

On Dec 14, 2007, at 12:30 PM, Török Edwin wrote:

> Hi,
>
> I see that you are looking for new programs for the testsuite, as
> described in 'Compile programs with the LLVM compiler', and 'Adding
> programs to the llvm testsuite" on llvm.org/OpenProjects.
>
> My favourite "C source code" is ClamAV (www.clamav.net), and I would
> like to get it included in the testsuite.
>
> This mail is kind of long, but please bear with me, as I want to  
> clarify
> how to best integrate Clamav into LLVM-testsuite's buildsystem.
>
> Why include it?
>
> It can be useful to find regressions, or new bugs; it already  
> uncovered
> a few bugs in llvm's cbe, llc, and optimizers that I've reported  
> through
> bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV  
> was
> also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)]
>
> It can be useful to test new/existing optimizations. There aren't any
> significant differences on its performance when compiled by different
> compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in  
> the
> future) make it faster ;)
>
> I had a quick look at the build infrastructure, and there are some
> issues with getting it to work with programs that use autoconf (such  
> as
> ClamAV), since AFAICT testsuites aren't allowed to run configure  
> (listed
> below)
>
> Building issues aside there are some more questions:
> * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite?
> * what version to use? Latest stable, or latest svn?
> [In any case I'll wait till the next stable is published, it should be
> happening *very soon*]
> * what happens if you find bugs that also cause it to fail under gcc
> (unlikely) ? [I would prefer to get an entry on clamav's bugzilla  
> then,
> with something in its subject saying it is llvm-testsuite related]
> * what happens if it only fails under llvm-gcc/llc/clang,.. and it is
> not due to a bug in llvm, but because of portability issues in the
> source code (unlikely)?
> I would prefer a clamav bugzilla here too, clamav is meant to be
> "portable" :)
>
> Also after I have set it up in the llvm testsuite, is there an easy  
> way
> to run clang on it? Currently I have to hack autoconf generated
> makefiles if I want to test clang on it.
>
> 1. I've manually run, and generated a clamav-config.h.
> This usually just contains HAVE_* macros for headers, which should all
> be available on a POSIX system, so it shouldn't be a problem from this
> perspective for llvm's build farm.
> However there are some target specific macros:
> #define C_LINUX 1
> #define FPU_WORDS_BIGENDIAN 0
> #define WORDS_BIGENDIAN 0
> Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system
> doesn't have a proper <stdint.h>
> Also not sure of this:
> /* ctime_r takes 2 arguments */
> #define HAVE_CTIME_R_2 1
>
> What OS and CPU do the machines on llvm's buildfarm have? We could  
> try a
> config.h that works on Linux (or MacOSX), and try to apply
> that to all, though there might be (non-obvious) failures.
>
> Any solutions to having these macros defined in the LLVM testsuite
> build? (especially for the bigendian macro)
>
> 2. AFAICT the llvm-testsuite build doesn't support a program that is
> built from multiple subdirectories.
> libclamav has its source split into multiple subdirectories, gathering
> those into one also requires changing #include that have relative  
> paths.
> I also get files with the same name but from different subdirs, so I
> have to rename them to subdir_filename, and do that in #include too.
>
> I have done this manually, and it works (native, llc, cbe work).
> I could hack together some perl script to do this automatically, or is
> there a better solution?
>
> 3. Comparing output: I've written a small script that compares the
> --debug output, because it needs some adjustments since I also get
> memory addresses in the --debug output that obviously don't match up
> between runs.
> There isn't anything else to compare besides --debug output (besides
> ClamAV saying no virus found), and that can be a fairly good test.
>
> 4. What is the input data?
> Clamav is fast :)
> It needs a lot of input data if you want to get reasonable timings out
> of it (tens, hundreds of MB).
> Scanning multiple small files will be I/O bound, and it'd be mostly
> useless as a benchmark (though still useful for testing
> compiler/optimization correctness).
>
> So I was thinking of using some large files already available in the
> testsuite (oggenc has one), and then maybe point it to scan the last
> *stable* build of LLVM. Or find some files that are scanned slowly,  
> but
> that don't presume lots of disk I/O (an archive, with ratio/size  
> limits
> disabled, with highly compressable data).
> You won't be able to benchmark clamav in a "real world" scenario  
> though,
> since that'd involve making it scanning malware, and I'm sure you  
> don't
> want that on your build farm.
>
> You could give it to scan random data, but you'll need it to be
> reproducible, so scanning /dev/random, or /bin of current LLVM tree is
> not a good choice ;)
>
> There's also the problem of eliminating the initial disk I/O time  
> out of
> the benchmark, like rerun 3 times automatically or something like  
> that?
>
> 5. Library dependencies
> It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I
> can reasonably assume zlib is available on all systems where the
> testsuite is run.
>
> 6. Sample output on using 126Mb of data as input:
>
> $ make TEST=nightly report
> ....
> Program  | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen |
> GCC     CBE     LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA
> LLC/LLC-BETA
> clamscan | 7.0729 2074308  *           *                *           |
> 17.48   17.55   18.81 *        *   | 1.00    0.93    n/a          n/a
>
> 7. Clamav is multithreaded
> If you're interested in testing if llvm-generated code works when
> multithreaded (I don't see why it wouldn't, but we're talking about a
> testsuite), you'd need to start the daemon (as an unprivileged user is
> just fine), and then connect to it.
> Is it possible to tell the testsuite build system to do this?
>
> 8. Code coverage
> Testing all of clamav code with llvm is ... problematic. Unless you
> create files with every packer/archiver known to clamav it is likely
> there will be files that are compiled in but never used during the
> testsuite run. You can still test that these files compile, but  
> thats it.
>
> 9. Configure tests
> Configure has 3 tests that check for gcc bugs known to break ClamAV (2
> of which you already have, since those are in gcc's testsuite too).  
> Add
> as separate "programs" to run in llvm testsuite?
>
> Thoughts?
>
> Best regards,
> Edwin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev