[PATCH] D24593: Standford/Bubble sort code restructure

Thu Oct 6 08:22:11 PDT 2016

zansari added a comment.

I'll add my 2 cents here, since I did spend some time looking into the underlying x86 architectural reasons for this regression.

1. I personally don't agree with these changes made to the benchmark. As is, the benchmark is just fine and all the changes do is to move things around to generate slightly different IL that the compiler could itself do or undo in the future. I see no reason for the changes outside of temporary masking of a performance swing.

2. I'll leave it to Evengy to describe the overall/general benefits of the original loop unrolling patch.

3. Having said all that, the x86 regressions observed in this benchmark from the patch are 100% random bad luck due to unpredictable behavior of the branch predictor (on the arch the regression was observed). I've verified this through pipe-traces and have made reproducers that demonstrate the regression even w/o Evgeny's loop unrolling transformation. For this particular benchmark, the loop unrolling changes are not doing anything "bad", and there's nothing the compiler can do to directly affect (i.e. first order) the source of the regression.

In my opinion, I think the right course of action, at this point is:

1. Evgeny to explain why the loop unrolling changes are generally a good thing to do (on all architectures).

2. If everyone is OK with #1, then the x86 regression seen in bubble-sort should be simply called "bad luck" due to  x86 arch reasons that the compiler can't see/control. We'll just expect this to randomly bounce back up/down in the future, and move on.

If anyone has any strong objections with #2, I'd be happy to discuss it in more detail.

Thanks,
Zia.

https://reviews.llvm.org/D24593