[all-commits] [llvm/llvm-project] e30271: RegAllocGreedy: Try local instruction splitting wi...

Mon Sep 12 06:24:05 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e30271169fa9c3e2b8b55e3d7cb73706a06eadd8
      https://github.com/llvm/llvm-project/commit/e30271169fa9c3e2b8b55e3d7cb73706a06eadd8
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2022-09-12 (Mon, 12 Sep 2022)

  Changed paths:
    M llvm/lib/CodeGen/RegAllocGreedy.cpp
    A llvm/test/CodeGen/AMDGPU/greedy-instruction-split-subrange.mir
    A llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
    M llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
    A llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
    M llvm/test/CodeGen/Thumb2/mve-vst3.ll
    M llvm/test/CodeGen/Thumb2/mve-vst4.ll

  Log Message:
  -----------
  RegAllocGreedy: Try local instruction splitting with subranges

This was only trying this to relax register class constraints, but
this can also help if there are subranges involved.

This solves a compilation failure for AMDGPU when there is high
pressure created by large register tuples. If one virtual register is
using most of the available budget, we need to be able to evict
subranges.

This solves the immediate failure, but this solution leaves a lot to
be desired. In the relevant testcases, we have 32-element tuples but
most of the uses are operations on 1 element subranges of it. What
we're now getting is a spill and restore of the full 1024 bits and an
extract of the used 32-bits. It would be far better if we introduced a
copy to a new virtual register with a smaller register class and used
narrower spills.

Furthermore, we could probably do a better job if the allocator were
to introduce new subranges where none previously existed in the
highest pressure scenarios. The block and region splits should also
try to split specific subranges out.

The mve-vst3.ll test changes looks like noise to me, but instruction
count increased by one. mve-vst4.ll looks like a solid improvement
with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir
also shows a solid reduction in total spill count.

This could use more tests but it's pretty tiring to come up with cases
that fail on this.