[llvm-dev] The AnghaBench collection of compilable programs

Chris Lattner via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 27 21:21:20 PST 2020


Hi Fernando,

My understanding is that LLVM’s test-suite is under a weird mix of different licenses.  So long as you preserve the original licenses (and only include ones with reasonable licenses), it should be possible I think.

-Chris

> On Feb 22, 2020, at 12:30 PM, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi Florian,
> 
>    we though about using UIUC, like in LLVM. Do you guys know if that
> could be a problem, given that we are mining the functions from
> github?
> 
>> Have you thought about integrating the benchmarks as external tests into LLVM’s test-suite? That would make it very easy to play around with.
> 
> We did not think about it actually. But we would be happy to do it, if
> the community accepts it.
> 
> Regards,
> 
> Fernando
> 
> On Sat, Feb 22, 2020 at 5:16 PM Florian Hahn <florian_hahn at apple.com> wrote:
>> 
>> Hi Fernando,
>> 
>> That sounds like a very useful resource to improve testing and also get easier access to good stress tests (e.gQuite a few very large functions have proven to surface compile time problems in some backend passes).
>> 
>> From a quick look on the website I couldn’t find under which license the code is published. That may be a problem for some users.
>> 
>> Have you thought about integrating the benchmarks as external tests into LLVM’s test-suite? That would make it very easy to play around with.
>> 
>> Cheers,
>> Florian
>> 
>>> On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Dear LLVMers,
>>> 
>>>   we, at UFMG, have been building a large collection of compilable
>>> benchmarks. Today, we have one million C files, mined from open-source
>>> repositories, that compile into LLVM bytecodes (and from there to
>>> object files). To ensure compilation, we perform type inference on the
>>> C programs. Type inference lets us replace missing dependencies.
>>> 
>>> The benchmarks are available at: http://cuda.dcc.ufmg.br/angha/
>>> 
>>> We have a technical report describing the construction of this
>>> collection: http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
>>> 
>>> Many things can be done with so many LLVM bytecodes. A few examples
>>> follow below:
>>> 
>>> * We can autotune compilers. We have trained YaCoS, a tool used to
>>> find good optimization sequences. The objective function is code size.
>>> We find the best optimization sequence for each program in the
>>> database. To compile an unknown program, we get the program in the
>>> database that is the closest, and apply the same optimization
>>> sequence. Results are good: we can improve on clang -Oz by almost 10%
>>> in MiBench, for instance.
>>> 
>>> * We can perform many types of explorations on real-world code. For
>>> instance, we have found that 95.4% of all the interference graphs of
>>> these programs, even in machine code (no phi-functions and lots of
>>> pre-colored registers), are chordal.
>>> 
>>> * We can check how well different tools are doing on real-world code.
>>> For instance, we can use these benchmarks to check how many programs
>>> can be analyzed by Ultimate Buchi Automizer
>>> (https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
>>> This is a tool that tries to prove termination or infinite execution
>>> for some programs.
>>> 
>>> * We can check how many programs can be compiled by different
>>> high-level synthesis tools into FPGAs. We have tried LegUp and Vivado,
>>> for instance.
>>> 
>>> * Our webpage contains a search box, so that you can get the closest
>>> programs to a given input program. Currently, we measure program
>>> distance as the Euclidian distance on Namolaru feature vectors.
>>> 
>>> We do not currently provide inputs for those programs. It's possible
>>> to execute the so called "leaf-functions", e.g., functions that do not
>>> call other routines. We have thousands of them. However, we do not
>>> guarantee the absence of undefined behavior during the execution.
>>> 
>>> Regards,
>>> 
>>> Fernando
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list