[LLVMdev] speed and code size issues

Sat Jul 18 01:07:26 PDT 2009

John Regehr wrote:
> We have some results that are somewhat entertaining and that relate to the 
> size/speed discussion.
> 
> The basic idea is exhaustive generation of C functions where "exhaustive" 
> is qualified by some structural restrictions (depth of AST, node type, 
> etc.).
> 
> For one particular set of restrictions we ended up with about 7 million C 
> functions.  We then compiled each of these functions with 7 compilers: 
> llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.
> 
> We then looked for functions where a particular pair of compilers 
> exhibited widely differing abilities to optimize.  For example, consider 
> this function:
> 
>    int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}
> 
> gcc-3.4 can see that it always returns 0, and emits code doing that.  On 
> the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to 
> compute the same zeroes.
> 
> The funny-named functions are little safe-math utilities that avoid 
> undefined behavior for all inputs.  "o1s" is "mod 16-bit signed" and "m8s" 
> is "multiply 8-bit signed".
> 
> Why is this interesting?  Because it provides a way to systematically find 
> areas of weakness in an optimizer, relative to a collection of other 
> optimizers.
> 
> If people would find it useful, I can put the full set of results on the 
> web when time permits.  I call the resulting codes "maximally 
> embarrassing" since each function represents some significant failure to 
> optimize.
> 
> The global maximally embarrasing function is one where various versions of 
> gcc (including llvm-gcc) emit code returning constant 0 and clang emits 
> 762 bytes of x86.  The C code is this:
> 
>    int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}
> 
> The other embarrassing thing about these functions is that most compilers 
> miscompile some of the 7 million functions.  llvm-gcc and clang are the 
> only ones we tested that actually get them all right.
> 
> To compile these functions this code needs to be prepended:
> 
>    #include <limits.h>
>    #include <stdint.h>
>    #include "safe_abbrev.h"
>    #include "safe_math.h"
> 
> The safe math headers are here:
> 
>    http://www.cs.utah.edu/~regehr/safe_math/
> 
> Anyway I just throw this out there.  People on this list have told me 
> before that missed-optimization bugs are not considered very interesting. 
> The ideal result (from my point of view as a compiler consumer) would be 
> for a few people from one or more of these compilers' development 
> communities to take seriously the job of eliminating these embarrassments.

I'm moderately interested. The nice thing about these sorts of bugs is 
that they interact very well with our other optimizations. However, as 
they aren't real-world cases I can't consider them high priority. I'd 
just like to have the list to look over and fix a few every once in a while.

I don't want you to go through too much trouble to put it on the web, 
but it sounds like you've already done the hard part of not only 
producing all the functions but scoring the compilers results! I'm 
really impressed by this and particularly like your systematic approach.

Nick

> John Regehr
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>