[PATCH] D94457: [AArch64] Add some missing fusion subtarget features

Tue Aug 16 02:34:37 PDT 2022

dmgreen added a comment.

I have for a while thought that we are missing a number of tuning features for AArch64 cores, but the (very brief) look into enabling more fusion on more cores ended up showing worse performance. The changes were not very large IIRC, but I worry that the more aggressive fusion was forcing instruction into worse positions. It may have just been that it was unlucky with noise. We get a lot of fusion naturally by the way the scheduler positions cmp/br and cmp/csel.

The optimization guides usually list the instructions that can be merged. If you have some evidence that any of them are producing better performance then that sounds like a useful patch. The performance number I got were done too quickly to draw any strong conclusions from.

As for the questions you actually asked - cbz and cbnz take a reg under aarch64 so don't have a separate cmp. For adding to -mcpu=generic, the general principle is that is needs to generally improve performance without hurting any other cores. Especially around the big-little cpus found in android phones. (Or it enables other optimizations like the linker relaxation from FeatureFuseAdrpAdd). GCC believes it is a benefit or benign, but they may have a slightly different algorithm for deciding when to fuse, I'm not sure. So long as we have some results that show it's improving things or flat for a selection of cores (big and little), then it should be OK.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94457/new/

https://reviews.llvm.org/D94457