[LLVMdev] speed and code size issues

Sat Jul 18 01:00:04 PDT 2009

On Fri, Jul 17, 2009 at 10:21 PM, John Regehr<regehr at cs.utah.edu> wrote:
> We have some results that are somewhat entertaining and that relate to the
> size/speed discussion.
>
> The basic idea is exhaustive generation of C functions where "exhaustive"
> is qualified by some structural restrictions (depth of AST, node type,
> etc.).
>
> For one particular set of restrictions we ended up with about 7 million C
> functions.  We then compiled each of these functions with 7 compilers:
> llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.
>
> We then looked for functions where a particular pair of compilers
> exhibited widely differing abilities to optimize.  For example, consider
> this function:
>
>   int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}
>
> gcc-3.4 can see that it always returns 0, and emits code doing that.  On
> the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to
> compute the same zeroes.
>
> The funny-named functions are little safe-math utilities that avoid
> undefined behavior for all inputs.  "o1s" is "mod 16-bit signed" and "m8s"
> is "multiply 8-bit signed".
>
> Why is this interesting?  Because it provides a way to systematically find
> areas of weakness in an optimizer, relative to a collection of other
> optimizers.
>
> If people would find it useful, I can put the full set of results on the
> web when time permits.  I call the resulting codes "maximally
> embarrassing" since each function represents some significant failure to
> optimize.
>
> The global maximally embarrasing function is one where various versions of
> gcc (including llvm-gcc) emit code returning constant 0 and clang emits
> 762 bytes of x86.  The C code is this:
>
>   int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}
>
> The other embarrassing thing about these functions is that most compilers
> miscompile some of the 7 million functions.  llvm-gcc and clang are the
> only ones we tested that actually get them all right.
>
> To compile these functions this code needs to be prepended:
>
>   #include <limits.h>
>   #include <stdint.h>
>   #include "safe_abbrev.h"
>   #include "safe_math.h"
>
> The safe math headers are here:
>
>   http://www.cs.utah.edu/~regehr/safe_math/
>
> Anyway I just throw this out there.  People on this list have told me
> before that missed-optimization bugs are not considered very interesting.
> The ideal result (from my point of view as a compiler consumer) would be
> for a few people from one or more of these compilers' development
> communities to take seriously the job of eliminating these embarrassments.

Interesting and cool work.

I wonder if there isn't a way to somehow collate the results? As Eli
points out, slogging through a bunch of such reports may not be very
valuable. However, if there was a way to group results so that
multiple failures could be "guessed" to be the same bug, then I'd
imagine the top N results off that list would be interesting &
definitely worth fixing.

Since your generator already has a significant amount of information
about the structure of the tests -- and that information probably
correlates with the root missed optimization -- there is a probably a
reasonable way to implement the "guess" using machine learning. Got an
AI grad student with free time handy?

 - Daniel

> John Regehr
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>