[PATCH] D155299: [AArch64][SVE2] Combine add+lsr to rshrnb for stores

Mon Jul 17 00:43:51 PDT 2023

david-arm added a comment.

This is a nice optimisation @MattDevereau, thanks! I found there is also another case we could support with loops like this where the store doesn't come straight afterwards:

  void foo(unsigned short *dest, unsigned short *src, long n) {
    for (long i = 0; i < n; i++)
      dest[i] += ((src[i] + 32) >> 6);
  }

In this case the IR sequence is add, lshr, trunc since the truncate doesn't get absorbed into the store. Maybe it's worth seeing if you can reuse your code in `tryCombineStoredNarrowShift` for this case too?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155299/new/

https://reviews.llvm.org/D155299