[LLVMdev] Enabling the vectorizer for -Os

Nadav Rotem nrotem at apple.com
Thu Jun 6 09:08:10 PDT 2013


Hi Chandler, 


> FWIW, I don't yet agree.
> 
> 
> Your tables show many programs growing in code size by over 20%. While there is associated performance improvements, it isn't clear that this is a good tradeoff. Historically, optimizations which optimize as a direct result of growing code size have *not* been an acceptable tradeoff in -Os.
> 
> 
> 
> 
> 


I am glad that you mentioned it. There are only three benchmarks that gained over 1% and there is only one benchmark that gained over 20%: the TSVC workloads.   The TSVC is an HPC benchmark and it is irrelevant for the -Os/-O2 discussion.  If you ignore TSVC you will notice that the code growth due to vectorization is 0.01%  

> 
> 
> From Owen's email, a characterization I agree with:
> "My understanding is that -Os is intended to be optimized-without-sacrificing-code-size."
> 
> 
> 
> The way I would phrase the difference between -Os and -Oz is similar: with -Os we don't *grow* the code size significantly even if it gives significant performance gains, whereas with -Oz we *shrink* the code size even if it means significant performance loss.
> 
> 
> 
> Neither of these concepts for -Os would seem to argue for running the vectorizer given the numbers you posted.
> 
> 
> 
> 
> 


0.01% code growth for everything except TSVC sounds pretty good to me.  I would be willing to accept 0.01% code growth to gain 2% on gzip and 9% on RC4. 
 

> 
> 
> 
> 
> 
> >  Regarding -O2 vs -O3, maybe we should set a higher cost threshold for O2 to increase the likelihood of improving the performance ?  We have very few regressions on -O3 as is and with better cost models I believe that we can bring them close to zero, so I am not sure if it can help that much.   Renato, I prefer not to estimate the encoding size of instructions. We know that vector instructions take more space to encode. Will knowing the exact number help us in making a better decision ? I don’t think so. On modern processors when running vectorizable loops, the code size of the vector instructions is almost never the bottleneck.
> > 
> > 
> > 
> 
> 
> That has specifically not been my experience when dealing with significantly larger and more complex application benchmarks.
>  
> 
> 
> 
> 
> 
I am constantly benchmarking the compiler and I am aware of a small number of regressions on -O3 using the vectorizer.   If you have a different experience then please share your numbers.   



> 
> 
> The tradeoffs you show in your numbers for -Os are actually exactly what I would expect for -O2: a willingness to grow code size (and compilation time) in order to get performance improvements. A quick eye-balling of the two tables seemed to show most of the size growth had associated performance growth. This, to me, is a good early indicator that the mode of the vectorizer is running in your -Os numbers is what we should look at enabling for -O2.
> 
> 
> 
> That said, I would like to see benchmarks from a more diverse set of applications than the nightly test suite. ;] I don't have a lot of faith in it being representative. I'm willing to contribute some that I care about (given enough time to collect the data), but I'd really like for other folks with larger codebases and applications to measure code size and performance artifacts as well.
> 
> 
> 
> 
> 


I am looking forward to seeing your contributions to the nightly test suite.  I would also like to see other people benchmark their applications. 


Thanks,
Nadav

> 
> 
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/8a1432ac/attachment.html>


More information about the llvm-dev mailing list