[PATCH] D19558: Codegen: [X86] Set preferred loop alignment to 32 bytes.

Wed Apr 27 11:40:17 PDT 2016

joker.eph added a comment.

I'd like to have more data that would show the real benefit of this, and isolate it from other factor (read to the end to see what I mean), because I believe that this 10% is a side effect and not a real consequence of the realignment.

First, KS is absolutely not a reliable test, I spent a long week on this test alone for the same issue, trying to align loops in this benchmark. In my case I was doing some performance tuning and noticed a 10% regression after one of my changes. I turned out that my A/B test was expanding the __FILE__ macro with a different size, and it lead to performance swing on this test.

Here are my notes from last October:

  I measured the time when replacing the __FILE__ macro by a  byte per byte growing string:

  0->4 : ~750ms
  5->19: ~850ms
  20->35: 730ms
  36->51: 850ms
  52->67: 730ms
  68->83: 850ms
  84->99: 750ms
  100->115: 870ms
  116->131:750ms
  …

  The pattern continues (i checked till 1024): 16 bytes fast, then 16 bytes slow.

My first thought was increasing the loop alignment, but it didn't provide great results all the time. I don't remember the details, but simply aligning the header of the hot loop independently of the rest of the didn't help as much (it can help by side effect on the rest of the code alignment).

At some point, somehow Bob heard that I was working on this test (KS) and pointed me to: https://llvm.org/bugs/show_bug.cgi?id=5615 ; I invite you to read Zia answers there.
You should also check the slides he attached to the bug that detail the issue in a very nice way.
Here is the relevant LLVM-dev post about this: http://lists.llvm.org/pipermail/llvm-dev/2015-June/086640.html

Repository:
  rL LLVM

http://reviews.llvm.org/D19558