[llvm-dev] The AnghaBench collection of compilable programs

Fri Feb 28 18:07:14 PST 2020

Sounds great!  If there are any that can be merged into test-suite, that would make it easier for CI systems to use it.

-Chris

> On Feb 28, 2020, at 3:21 AM, Fernando Magno Quintao Pereira <pronesto at gmail.com> wrote:
> 
> Thank you for the feedback, Chris and Florian. We will start updating
> the benchmarks with the licenses from the original repositories where
> they came from. Once we update the individual benchmarks, we will try
> to make them available as an external test in LLVM.
> 
> Regards,
> 
> Fernando
> 
> On Fri, Feb 28, 2020 at 2:21 AM Chris Lattner <clattner at nondot.org> wrote:
>> 
>> Hi Fernando,
>> 
>> My understanding is that LLVM’s test-suite is under a weird mix of different licenses.  So long as you preserve the original licenses (and only include ones with reasonable licenses), it should be possible I think.
>> 
>> -Chris
>> 
>>> On Feb 22, 2020, at 12:30 PM, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Hi Florian,
>>> 
>>>   we though about using UIUC, like in LLVM. Do you guys know if that
>>> could be a problem, given that we are mining the functions from
>>> github?
>>> 
>>>> Have you thought about integrating the benchmarks as external tests into LLVM’s test-suite? That would make it very easy to play around with.
>>> 
>>> We did not think about it actually. But we would be happy to do it, if
>>> the community accepts it.
>>> 
>>> Regards,
>>> 
>>> Fernando
>>> 
>>> On Sat, Feb 22, 2020 at 5:16 PM Florian Hahn <florian_hahn at apple.com> wrote:
>>>> 
>>>> Hi Fernando,
>>>> 
>>>> That sounds like a very useful resource to improve testing and also get easier access to good stress tests (e.gQuite a few very large functions have proven to surface compile time problems in some backend passes).
>>>> 
>>>> From a quick look on the website I couldn’t find under which license the code is published. That may be a problem for some users.
>>>> 
>>>> Have you thought about integrating the benchmarks as external tests into LLVM’s test-suite? That would make it very easy to play around with.
>>>> 
>>>> Cheers,
>>>> Florian
>>>> 
>>>>> On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>> 
>>>>> Dear LLVMers,
>>>>> 
>>>>>  we, at UFMG, have been building a large collection of compilable
>>>>> benchmarks. Today, we have one million C files, mined from open-source
>>>>> repositories, that compile into LLVM bytecodes (and from there to
>>>>> object files). To ensure compilation, we perform type inference on the
>>>>> C programs. Type inference lets us replace missing dependencies.
>>>>> 
>>>>> The benchmarks are available at: http://cuda.dcc.ufmg.br/angha/
>>>>> 
>>>>> We have a technical report describing the construction of this
>>>>> collection: http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
>>>>> 
>>>>> Many things can be done with so many LLVM bytecodes. A few examples
>>>>> follow below:
>>>>> 
>>>>> * We can autotune compilers. We have trained YaCoS, a tool used to
>>>>> find good optimization sequences. The objective function is code size.
>>>>> We find the best optimization sequence for each program in the
>>>>> database. To compile an unknown program, we get the program in the
>>>>> database that is the closest, and apply the same optimization
>>>>> sequence. Results are good: we can improve on clang -Oz by almost 10%
>>>>> in MiBench, for instance.
>>>>> 
>>>>> * We can perform many types of explorations on real-world code. For
>>>>> instance, we have found that 95.4% of all the interference graphs of
>>>>> these programs, even in machine code (no phi-functions and lots of
>>>>> pre-colored registers), are chordal.
>>>>> 
>>>>> * We can check how well different tools are doing on real-world code.
>>>>> For instance, we can use these benchmarks to check how many programs
>>>>> can be analyzed by Ultimate Buchi Automizer
>>>>> (https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
>>>>> This is a tool that tries to prove termination or infinite execution
>>>>> for some programs.
>>>>> 
>>>>> * We can check how many programs can be compiled by different
>>>>> high-level synthesis tools into FPGAs. We have tried LegUp and Vivado,
>>>>> for instance.
>>>>> 
>>>>> * Our webpage contains a search box, so that you can get the closest
>>>>> programs to a given input program. Currently, we measure program
>>>>> distance as the Euclidian distance on Namolaru feature vectors.
>>>>> 
>>>>> We do not currently provide inputs for those programs. It's possible
>>>>> to execute the so called "leaf-functions", e.g., functions that do not
>>>>> call other routines. We have thousands of them. However, we do not
>>>>> guarantee the absence of undefined behavior during the execution.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Fernando
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>