[llvm-dev] llvm (the middle-end) is getting slower, December edition

Sun Dec 18 15:55:42 PST 2016

On Sat, Dec 17, 2016 at 7:48 PM, Davide Italiano <davide at freebsd.org> wrote:

>
>
> On Dec 17, 2016 7:41 PM, "Sean Silva" <chisophugis at gmail.com> wrote:
>
>
>
> On Sat, Dec 17, 2016 at 6:32 PM, Davide Italiano via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Sat, Dec 17, 2016 at 1:35 PM, Davide Italiano <davide at freebsd.org>
>> wrote:
>> [...]
>>
>> > I don't have an infrastructure to measure the runtime performance
>> > benefits/regression of clang, but I have for `game7`.
>> > I wasn't able to notice any fundamental speedup (at least, not
>> > something that justifies a 2x build-time).
>> >
>>
>> Just to provide numbers (using Sean's Mathematica template, thanks),
>> here's a plot of the CDF of the frames in a particular level/scene.
>> The curves pretty much match, and the picture in the middle shows a
>> relative difference of 0.4% between Jun and Dec (which could be very
>> possibly be in the noise).
>>
>
> With 5-10 runs per binary you should be able to reliably measure down to
> 0.5% difference on game7 (50 usec difference per frame). With just one run
> per binary like you have here I would not trust a 0.4% difference.
>
>
> Yeah, I agree. This was mostly a sanity check to understand if there was a
> significant improvement at runtime. I ran the same test 7 times in the last
> our but the difference is always in the noise, FWIW.
>

The reason to add more runs is that you can measure performance differences
smaller than the run to run variation for the same binary. Most of the
time, if you just plot 10 runs of each binary all on the same plot you can
easily eyeball it to see that one is consistently faster. If you want to
formalize this you could do something like:

1. pair up each run of one binary with another run from the other binary
2. see which (say) median is higher to get a boolean value of which is
faster in that pair. (you can be fancier and compute an integral over e.g.
the 20-80%'ile frame times; or you can investigate the tail by looking at
the 99'th percentile)
3. treat that as a coin toss (i.e., as a bernoulli random variable with
parameter p; a "fair coin" would be p=0.5)
4. Use bayesian methods to propagate the "coin toss" observations backward
to infer a distribution on the possible values of the parameter p of the
bernoulli random variable.

Step 4 is actually fairly simple; an example of how to do it is here:
http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb#Example:-Mandatory-coin-flip-example
Notice (in the picture on that page) how the prior assumption that p might
be anywhere in [0,1] with uniform probability is refined as we incorporate
the result of each coin flip (that's really all the "bayesian" means;
Bayes' theorem tells you how to incorporate the new evidence to update your
estimated distribution (often called the "posterior" distribution)). As we
incorporate more coin tosses, we refine that initial distribution. For a
fair coin, after enough tosses, the posterior distribution becomes very
concentrated around p=0.5

For the "coin flip", it's just a matter of plugging into a closed form
formula to get the distribution, but for more sophisticated models there is
no closed form formula.
This is where MCMC libraries (like it shows how to use in that
"Probabilistic Programming and Bayesian Methods for Hackers" book) come
into play.
E.g. note that in step 2 you are losing a lot of information actually (you
reduce two frame time CDF's (thousands of data points each) to a single bit
of information). Using an MCMC library you can have fine-grained control
over the amount of detail in the model and it can propagate your
observations back to the parameters of your model in a continuous and
information-preserving way (to the extent that you design your model to
preserve information; there are tradeoffs between model accuracy and
computation time obviously).

(step 1 also loses information, btw. You also lose information when you
start looking at the frame time distribution because you lose information
about the ordering of the frames)

-- Sean Silva

>
> -- Sean Silva
>
>
>> On the same scene, the difference between -O3 and -flto is 12%, FWIW.
>> So it seems that at least for this particular case all the compile
>> time didn't buy any runtime improvement.
>>
>> --
>> Davide
>>
>> "There are no solved problems; there are only problems that are more
>> or less solved" -- Henri Poincare
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161218/d7940132/attachment.html>