[llvm-dev] Adding a new External Suite to test-suite

Mon Apr 6 16:30:06 PDT 2020

Hi Fernando,

On 4/6/20 1:53 PM, Fernando Magno Quintao Pereira via llvm-dev wrote:
> Hi Johannes,
>
>> I'd also like to know what the intention here is. What is tested and how?
>      we have a few uses for these benchmarks in the technical report:
> http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf, but
> since then, we came up with other applications. All these programs
> produce object files without external dependencies. We have been using
> them to train a predictive compiler that reduces code size (the
> technical report has more about that). In addition, you can use them
> to compare compilation time, for instance, as Michael had asked. We
> have also used these benchmarks in two studies:
>
> 1) http://cuda.dcc.ufmg.br/angha/chordAnalysis
> 2) http://cuda.dcc.ufmg.br/angha/staticProperties
>
> A few other applications that I know about (outside our research
> group), include:
>
> * Comparing the size of code produced by three HLS tools: Intel HLS,
> Vivado and LegUp.
> * Testing the Ultimate Buchi Automizer, to see which kind of C
> constructs it handles
> * Comparing compilation time of gcc vs clang
>
> A few other studies that I would like to carry out:
>
> * Checking the runtime of different C parsers that we have.
> * Trying to infer, empirically, the complexity of compiler analyses
> and optimizations.

All the use cases sound reasonable but why do we need these kind of 
"weird files" to do this?

I mean, why would you train or measure something on single definition 
translation units and not on the original ones, potentially one function 
at a time?

To me this looks like a really good way to skew the input data set, 
e.g., you don't ever see a call that can be inlined or for which 
inter-procedural reasoning is performed. As a consequence each function 
is way smaller than it would be in a real run, with all the consequences 
on the results obtained from such benchmarks. Again, why can't we take 
the original programs instead?

>> Looking at a few of these it seems there is not much you can do as it is little code with a lot of unknown function calls and global symbols.
> Most of the programs are small (avg 63 bytecodes, std 97); however,
> among these 1M C functions, we have a few large ones, with more than
> 40K bytecodes.

How many duplicates are there among the small functions? I mean, close 
to 1M functions of such a small size (and with similar pro- and epilogue).

Cheers,

   Johannes

> Regards,
>
> Fernando
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev