[PATCH] D152828: [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode

Thu Sep 14 01:06:20 PDT 2023

dmgreen added a comment.

I have been wondering what to do with this, whether to nitpick the costmodel in ways that might not be useful, but it is probably better to get it in and work out any adjustments to the target features as needed. I think this is closer to what we are aiming for in terms of heuristics.

Is it worth setting EnableSinkAndFold to false, we can commit this then have another patch (doesn't need review) to enable it? Just in case there are problems it can help ease the commit-revert cycles.

Otherwise I was looking through the sinking code and had an extra question.

================
Comment at: llvm/lib/CodeGen/MachineSink.cpp:467-468
+        // don't consider that live range extended.
+        const TargetRegisterClass *RCS =
+            MRI->getRegClass(UseInst.getOperand(1).getReg());
+        if (RCA && RCA->hasSuperClassEq(RCS))
----------------
Is this assuming that the UseInst is a copy or that canFoldIntoAddrMode instructions always have an Operand(1) which is a reg? Is UseInst.getOperand(1) always MO in for AArch64?

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2863
+    if (Shift == 1 && !Subtarget.hasLSLFast())
+      return false;
+    return canFoldAddRegIntoAddrMode(1 << Shift);
----------------
chill wrote:
> dmgreen wrote:
> > I think it's worth clarifying how these should work with LSLFast and addressing operands. I had an explanation of how I thought LSLFast should work in https://reviews.llvm.org/D155470#4527270. Let me know if you think doesn't sound right. There is a patch to split LSLFast into multiple parts as a first step in https://reviews.llvm.org/D157982.
> I guess we can be more precise by distinguishing between cores where shifts 0, 1, 2, and 3 are fast
> and cores where just shifts 0, 2, and 3 are fast. Not sure if it's worth adding an extra feature. Lacking such a feature, for now the code considers shift 1 to be slow by default, but it could just as well consider shift 1 to be fast by default.
Arm OoO cores can treat any extend as cheap, so long as the shift is 2 or 3. For in order cores it appears to be any shift. My understanding of LSLFast was the the UXTW/SXTW would still not be considered cheap. What you have is probably good though, and matches my understanding of what many cores implement.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152828/new/

https://reviews.llvm.org/D152828