[LLVMdev] speed and code size issues

John Regehr regehr at cs.utah.edu
Fri Jul 17 22:21:40 PDT 2009


We have some results that are somewhat entertaining and that relate to the 
size/speed discussion.

The basic idea is exhaustive generation of C functions where "exhaustive" 
is qualified by some structural restrictions (depth of AST, node type, 
etc.).

For one particular set of restrictions we ended up with about 7 million C 
functions.  We then compiled each of these functions with 7 compilers: 
llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.

We then looked for functions where a particular pair of compilers 
exhibited widely differing abilities to optimize.  For example, consider 
this function:

   int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}

gcc-3.4 can see that it always returns 0, and emits code doing that.  On 
the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to 
compute the same zeroes.

The funny-named functions are little safe-math utilities that avoid 
undefined behavior for all inputs.  "o1s" is "mod 16-bit signed" and "m8s" 
is "multiply 8-bit signed".

Why is this interesting?  Because it provides a way to systematically find 
areas of weakness in an optimizer, relative to a collection of other 
optimizers.

If people would find it useful, I can put the full set of results on the 
web when time permits.  I call the resulting codes "maximally 
embarrassing" since each function represents some significant failure to 
optimize.

The global maximally embarrasing function is one where various versions of 
gcc (including llvm-gcc) emit code returning constant 0 and clang emits 
762 bytes of x86.  The C code is this:

   int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}

The other embarrassing thing about these functions is that most compilers 
miscompile some of the 7 million functions.  llvm-gcc and clang are the 
only ones we tested that actually get them all right.

To compile these functions this code needs to be prepended:

   #include <limits.h>
   #include <stdint.h>
   #include "safe_abbrev.h"
   #include "safe_math.h"

The safe math headers are here:

   http://www.cs.utah.edu/~regehr/safe_math/

Anyway I just throw this out there.  People on this list have told me 
before that missed-optimization bugs are not considered very interesting. 
The ideal result (from my point of view as a compiler consumer) would be 
for a few people from one or more of these compilers' development 
communities to take seriously the job of eliminating these embarrassments.

John Regehr



More information about the llvm-dev mailing list