[LLVMdev] Enabling the vectorizer for -Os
Nadav Rotem
nrotem at apple.com
Thu Jun 6 09:08:10 PDT 2013
Hi Chandler,
> FWIW, I don't yet agree.
>
>
> Your tables show many programs growing in code size by over 20%. While there is associated performance improvements, it isn't clear that this is a good tradeoff. Historically, optimizations which optimize as a direct result of growing code size have *not* been an acceptable tradeoff in -Os.
>
>
>
>
>
I am glad that you mentioned it. There are only three benchmarks that gained over 1% and there is only one benchmark that gained over 20%: the TSVC workloads. The TSVC is an HPC benchmark and it is irrelevant for the -Os/-O2 discussion. If you ignore TSVC you will notice that the code growth due to vectorization is 0.01%
>
>
> From Owen's email, a characterization I agree with:
> "My understanding is that -Os is intended to be optimized-without-sacrificing-code-size."
>
>
>
> The way I would phrase the difference between -Os and -Oz is similar: with -Os we don't *grow* the code size significantly even if it gives significant performance gains, whereas with -Oz we *shrink* the code size even if it means significant performance loss.
>
>
>
> Neither of these concepts for -Os would seem to argue for running the vectorizer given the numbers you posted.
>
>
>
>
>
0.01% code growth for everything except TSVC sounds pretty good to me. I would be willing to accept 0.01% code growth to gain 2% on gzip and 9% on RC4.
>
>
>
>
>
> > Regarding -O2 vs -O3, maybe we should set a higher cost threshold for O2 to increase the likelihood of improving the performance ? We have very few regressions on -O3 as is and with better cost models I believe that we can bring them close to zero, so I am not sure if it can help that much. Renato, I prefer not to estimate the encoding size of instructions. We know that vector instructions take more space to encode. Will knowing the exact number help us in making a better decision ? I don’t think so. On modern processors when running vectorizable loops, the code size of the vector instructions is almost never the bottleneck.
> >
> >
> >
>
>
> That has specifically not been my experience when dealing with significantly larger and more complex application benchmarks.
>
>
>
>
>
>
I am constantly benchmarking the compiler and I am aware of a small number of regressions on -O3 using the vectorizer. If you have a different experience then please share your numbers.
>
>
> The tradeoffs you show in your numbers for -Os are actually exactly what I would expect for -O2: a willingness to grow code size (and compilation time) in order to get performance improvements. A quick eye-balling of the two tables seemed to show most of the size growth had associated performance growth. This, to me, is a good early indicator that the mode of the vectorizer is running in your -Os numbers is what we should look at enabling for -O2.
>
>
>
> That said, I would like to see benchmarks from a more diverse set of applications than the nightly test suite. ;] I don't have a lot of faith in it being representative. I'm willing to contribute some that I care about (given enough time to collect the data), but I'd really like for other folks with larger codebases and applications to measure code size and performance artifacts as well.
>
>
>
>
>
I am looking forward to seeing your contributions to the nightly test suite. I would also like to see other people benchmark their applications.
Thanks,
Nadav
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/8a1432ac/attachment.html>
More information about the llvm-dev
mailing list