[PATCH] D102748: [LoopUnroll] Don't unroll before vectorisation

Wed May 19 04:22:54 PDT 2021

fhahn added a comment.

In D102748#2768113 <https://reviews.llvm.org/D102748#2768113>, @SjoerdMeijer wrote:

> All with the same result. So in a way this is an advertisement for skipping the fully unroller early. But like I said, I understand the point, and it was not my intention to skip fully unrolling, I just wanted it after the loop vectoriser.
> Also, I was expecting that if was a terrible idea, I would have expected this to be flagged up by SPEC as it contains some different codes; but fair enough, I have run only SPEC and the embedded benchmarks.

Fair enough, this is one of the simple cases where the backend picks up the slack from the middle-end (as @nikic mentioned), but I think we should focus on the IR we hand off to the backend, because the backend won't be able to optimize slightly more complex variations.

With a few small tweaks to the example, the backend is not able to pick up the slack (at least AArch64):

  #include <string.h>

  void use(char *);

  void foo(int x) {
    char Ptr[16];
    memset(Ptr, 0, 16);
    if (x == 20)
      Ptr[5] = 10;
    for (unsigned i = 0; i < 16; i++ )
      Ptr[i] = i+1;
    use(&Ptr[0]);
  }

Another different example that should also generate worse assembly:

  #include <string.h>

  void f(char*);
  void bar();

  void test(char *Ptr, int x) {
    char Foo[16];
    memset(Foo, 0, 16);
    for (unsigned i = 0; i < 16; i++ ) {
      Foo[i] = i+1;
      bar();
    }

    for (unsigned i = 0; i < 16; i++ ) {
      Ptr[i] = Foo[i] + 2;
    }
  }

Those are just a few variations focused on DSE. I'd expect that similar issues exist for other passes, like GVN, InstCombine & co.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102748/new/

https://reviews.llvm.org/D102748