[PATCH] D154205: [MachineLICM] Handle subloops

Mon Jul 10 08:36:56 PDT 2023

jaykang10 updated this revision to Diff 538657.
jaykang10 added subscribers: sunfish, arsenm.
jaykang10 added a comment.
Herald added subscribers: wangpc, pmatos, asb, kerbowa, aheejin, jgravelle-google, sbc100, jvesely, dschuff.

Updated test files.

For AMDGPU target, after hoisting some MIRs, `SIOptimizeExecMaskingPreRA` pass fails to remove them.
On `CodeGen/AMDGPU/agpr-copy-no-free-registers.ll`, it has below loop in MIR level.

  Loop at depth 1 containing: %bb.1<header>,%bb.3,%bb.5,%bb.6,%bb.7,%bb.8,%bb.11,%bb.12,%bb.4,%bb.2,%bb.9<latch><exiting>
      Loop at depth 2 containing: %bb.5<header>,%bb.6,%bb.7,%bb.8,%bb.11<latch><exiting>

With this patch, below MIRs are hoisted from bb.5 to bb.3 in inner loop.

  %155:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %13:sreg_64_xexec, implicit $exec
  %258:sreg_64_xexec = V_CMP_NE_U32_e64 %155:vgpr_32, %90:sreg_32, implicit $ex

After that, `SIOptimizeExecMaskingPreRA` pass fails to optimize the MIRs rather than original one so it looks there are more instructions with this patch. I have not checked the pass in detail but I guess the pass could handle the case. Other AMDGPU regressions have same issue.

  A comment on SIOptimizeExecMaskingPreRA 
  // Optimize sequence
  //    %sel = V_CNDMASK_B32_e64 0, 1, %cc
  //    %cmp = V_CMP_NE_U32 1, %sel
  //    $vcc = S_AND_B64 $exec, %cmp
  //    S_CBRANCH_VCC[N]Z
  // =>
  //    $vcc = S_ANDN2_B64 $exec, %cc
  //    S_CBRANCH_VCC[N]Z

@arsenm If this change causes something wrong for AMDGPU target, please let me know.

For Webassembly target, on `CodeGen/WebAssembly/reg-stackify.ll`, I can see below MIR is hoisted to inner loop's preheader and it looks ok.

  %3:fr64 = ADDSDrr %1:fr64(tied-def 0), %28:fr64, implicit $mxcsr

@sunfish If this change causes something wrong for WebAssembly target, please let me know.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154205/new/

https://reviews.llvm.org/D154205

Files:
  llvm/lib/CodeGen/MachineLICM.cpp
  llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll
  llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
  llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll
  llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
  llvm/test/CodeGen/Thumb2/mve-gather-scatter-optimisation.ll
  llvm/test/CodeGen/WebAssembly/reg-stackify.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D154205.538657.patch
Type: text/x-patch
Size: 26895 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230710/8d36aeeb/attachment.bin>