[LLVMdev] speed and code size issues
John Regehr
regehr at cs.utah.edu
Fri Jul 17 22:21:40 PDT 2009
We have some results that are somewhat entertaining and that relate to the
size/speed discussion.
The basic idea is exhaustive generation of C functions where "exhaustive"
is qualified by some structural restrictions (depth of AST, node type,
etc.).
For one particular set of restrictions we ended up with about 7 million C
functions. We then compiled each of these functions with 7 compilers:
llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.
We then looked for functions where a particular pair of compilers
exhibited widely differing abilities to optimize. For example, consider
this function:
int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}
gcc-3.4 can see that it always returns 0, and emits code doing that. On
the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to
compute the same zeroes.
The funny-named functions are little safe-math utilities that avoid
undefined behavior for all inputs. "o1s" is "mod 16-bit signed" and "m8s"
is "multiply 8-bit signed".
Why is this interesting? Because it provides a way to systematically find
areas of weakness in an optimizer, relative to a collection of other
optimizers.
If people would find it useful, I can put the full set of results on the
web when time permits. I call the resulting codes "maximally
embarrassing" since each function represents some significant failure to
optimize.
The global maximally embarrasing function is one where various versions of
gcc (including llvm-gcc) emit code returning constant 0 and clang emits
762 bytes of x86. The C code is this:
int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}
The other embarrassing thing about these functions is that most compilers
miscompile some of the 7 million functions. llvm-gcc and clang are the
only ones we tested that actually get them all right.
To compile these functions this code needs to be prepended:
#include <limits.h>
#include <stdint.h>
#include "safe_abbrev.h"
#include "safe_math.h"
The safe math headers are here:
http://www.cs.utah.edu/~regehr/safe_math/
Anyway I just throw this out there. People on this list have told me
before that missed-optimization bugs are not considered very interesting.
The ideal result (from my point of view as a compiler consumer) would be
for a few people from one or more of these compilers' development
communities to take seriously the job of eliminating these embarrassments.
John Regehr
More information about the llvm-dev
mailing list