[PATCH] D152828: [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode

Mon Sep 4 00:46:17 PDT 2023

dmgreen added a comment.

I've been wondering how to stage this, whilst trying to move towards https://reviews.llvm.org/D155470#4527270. I think this implements something closer to "Ext23Fast" from the 4 options there, but checks for LSLFast.

It might be best to go with this, and then we can adjust the Target features and clean up the uses in another patch. Otherwise we are trying to do too many things at once, and there is already quite a bit of code here. I have some questions inline about the folding of add's, but otherwise from what I can tell this looks good.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2926
+      return false;
+    // Shift 1 (scale 2) in address is one extra cycle and one extra unit on
+    // some CPUs.
----------------
I believe it is shifts of 1 or 4 that would be more expensive for OoO cores, but the other shift types are also cheap. AddrLSLFast means any addressing mode with a LSL with shift <= 3 are cheap. ALULSLFast means adds/subs with LSL<=4 are fast.

I think the logic should be similar to that in DAGCombine (ignoring register pressure for a moment). If we are optimizing for size or there are no other uses the fold should be beneficial. Otherwise we treat it as cheap if we have AddrLSLFast and the shift is <= 3. An ADDXrs could take 2 cycles anyway so could be more aggressive?

Does this take into account the number of uses, and should it? Should it fold more under Optsize?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152828/new/

https://reviews.llvm.org/D152828