[PATCH] D8943: Calculate vectorization factor using the narrowest type instead of widest type

Tue Oct 13 14:48:08 PDT 2015

congh added a comment.

In http://reviews.llvm.org/D8943#260101, @spatel wrote:

> F909822: vecfactor - perf.csv <http://reviews.llvm.org/F909822>
>
> I applied this patch on top of r248957 and ran the benchmarking subset of test-suite on an AMD Jaguar 1.5 GHz + Ubuntu 14.04 test system. The baseline is -O3 -march=btver2 while the comparison run added -mllvm -vectorizer-maximize-bandwidth (data attached).
>
> I see very little performance difference on any test: almost everything is +/- 2% which is within the noise for most tests.
>
> Cong, I would be interested to know if you saw any large diffs on these tests on your test system or if the bigger wins/losses all occurred on the non-benchmarking tests in test-suite?

Thank you for the performance test! I think there may be two reasons that why we could not observe big performance difference in llvm test suite:

1. There is no hotspot that includes a loop with types of different sizes (this is what this patch is optimizing).
2. There are some problems with the cost model in llvm. Even we can choose a larger VF, the cost model shows that the larger VF has the larger cost. I will deal with this issue later.

I don't have a test in my codebase that benefits from this patch, but it is quite easy to synthesize one:

const int N = 1024 * 32;
int a[N];
char b[N];

int main() {

  for (int i = 0; i < N; ++i) {
    for (int i = 0; i < N; ++i) {
      a[i]++;
      b[i]++;
    }
  }

}

For the code shown above, the original running time is ~0.35s and with this patch the running time is reduced to ~0.228s.

http://reviews.llvm.org/D8943