[llvm-dev] The AnghaBench collection of compilable programs

Sat Feb 22 12:16:25 PST 2020

Hi Fernando,

That sounds like a very useful resource to improve testing and also get easier access to good stress tests (e.gQuite a few very large functions have proven to surface compile time problems in some backend passes).

From a quick look on the website I couldn’t find under which license the code is published. That may be a problem for some users.

Have you thought about integrating the benchmarks as external tests into LLVM’s test-suite? That would make it very easy to play around with.

Cheers,
Florian

> On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Dear LLVMers,
> 
>    we, at UFMG, have been building a large collection of compilable
> benchmarks. Today, we have one million C files, mined from open-source
> repositories, that compile into LLVM bytecodes (and from there to
> object files). To ensure compilation, we perform type inference on the
> C programs. Type inference lets us replace missing dependencies.
> 
> The benchmarks are available at: http://cuda.dcc.ufmg.br/angha/
> 
> We have a technical report describing the construction of this
> collection: http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
> 
> Many things can be done with so many LLVM bytecodes. A few examples
> follow below:
> 
> * We can autotune compilers. We have trained YaCoS, a tool used to
> find good optimization sequences. The objective function is code size.
> We find the best optimization sequence for each program in the
> database. To compile an unknown program, we get the program in the
> database that is the closest, and apply the same optimization
> sequence. Results are good: we can improve on clang -Oz by almost 10%
> in MiBench, for instance.
> 
> * We can perform many types of explorations on real-world code. For
> instance, we have found that 95.4% of all the interference graphs of
> these programs, even in machine code (no phi-functions and lots of
> pre-colored registers), are chordal.
> 
> * We can check how well different tools are doing on real-world code.
> For instance, we can use these benchmarks to check how many programs
> can be analyzed by Ultimate Buchi Automizer
> (https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
> This is a tool that tries to prove termination or infinite execution
> for some programs.
> 
> * We can check how many programs can be compiled by different
> high-level synthesis tools into FPGAs. We have tried LegUp and Vivado,
> for instance.
> 
> * Our webpage contains a search box, so that you can get the closest
> programs to a given input program. Currently, we measure program
> distance as the Euclidian distance on Namolaru feature vectors.
> 
> We do not currently provide inputs for those programs. It's possible
> to execute the so called "leaf-functions", e.g., functions that do not
> call other routines. We have thousands of them. However, we do not
> guarantee the absence of undefined behavior during the execution.
> 
> Regards,
> 
> Fernando
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev