[llvm] [MachineLICM] Workaround - apply RegMasks conservatively (PR #95926)

Wed Jun 19 00:31:11 PDT 2024

================
@@ -0,0 +1,49 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s
+
+# FIXME: Running RA is needed otherwise it runs pre-RA LICM.
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   B %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $q11 = MOVIv4i32 2, 8
+  ; CHECK-NEXT:   BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+  ; CHECK-NEXT:   renamable $q10 = MVNIv4i32 4, 0
----------------
Pierre-vh wrote:

We could perhaps do a targeted fix for AArch64 too, and create an artificial "high" register for Q registers to model this.
@davemgreen What do you think about that?

To recap, we have 3 options:
- revert the MachineLICM change
   - As the change is too good for AMDGPU, I would instead implement the 2 approaches side-by-side and use TRI to switch between both. All targets would use the new approach except AArch.
- Fix regunits calculations
  - Perfect fix in theory, in practice I feel like this could have a big impact on regalloc as a whole so I'm pessimistic it'd be a small change that lands quickly
- Add fake high registers for AArch64 Q registers
  - I think we do that for some registers on AMDGPU already. On one hand it can be seen as a the right thing to do to model high bits properly, OTOH it's a hack around shortcomings of the register modeling we have now in LLVM.

Option 1 vs Option 3 is basically a decision or whether we want the workaround to be in the backend, or in the pass. I think option 2 needs to be done anyway but I expect it'll be a longer task.

https://github.com/llvm/llvm-project/pull/95926