[PATCH] D87157: [GlobalISel] Add a localizer for copies from physregs and use it in AArch64

Fri Sep 4 12:51:59 PDT 2020

paquette created this revision.
paquette added reviewers: aemerson, arsenm, volkan, aditya_nandakumar, qcolombet, rtereshin, dsanders.
Herald added subscribers: danielkiss, hiraditya, kristof.beyls, rovka, mgorny.
Herald added a project: LLVM.
paquette requested review of this revision.
Herald added a subscriber: wdng.

In the LLVM test suite, very small functions like this are fairly common:

  declare i8* @bar(i32*, i32, i32**)
  define i8* @foo(i32 %x, i32** %y) {
    %z = tail call i8* @bar(i32* null, i32 %x, i32** %y)
    ret i8* %z
  }

In GlobalISel, during CallLowering, we tend to produce copies in the following order (for AArch64):

  %0:_(s32) = COPY $w0
  %1:_(p0) = COPY $x1
  %3:_(p0) = G_CONSTANT i64 0
  $x0 = COPY %3(p0)
  $w1 = COPY %0(s32)
  $x2 = COPY %1(p0)
  TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

SelectionDAG, on the other hand, produces this:

  %1:gpr64 = COPY $x1
  %0:gpr32 = COPY $w0
  %2:gpr64all = COPY $xzr
  $x0 = COPY %2
  $w1 = COPY %0
  $x2 = COPY %1
  TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

All that's different here is the order of the copies from physical registers. However, GlobalISel ultimately produces twice as many copies: https://godbolt.org/z/GKGsh5

This is because at some point during greedy regalloc, we (greedily) make the wrong choice of virtual register mapping, and miss the chance to emit identity copies.

It seems like the register allocator is making a reasonable choice with what it's been given. So, personally, I think it makes sense to canonicalize these copies into a form which is amicable to the greedy register allocator.

This patch adds a specialized copy localizer pass. This pass is separate from the main localizer because I think the circumstances where it should be used are more specialized. Only targets (and opt-levels) which use the greedy register allocator can benefit from this.

This pass will reorder copies in a contiguous block of copies from physregs such that the first copy has the *latest* user in the MachineBasicBlock, and the last copy has the *earliest* user in the MachineBasicBlock.

So for this MIR:

  %0:_(s32) = COPY $w0
  %1:_(p0) = COPY $x1
  %3:_(p0) = G_CONSTANT i64 0
  $x0 = COPY %3(p0)
  $w1 = COPY %0(s32)
  $x2 = COPY %1(p0)
  TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

We'll reorder the copies like so:

  %1:_(p0) = COPY $x1
  %0:_(s32) = COPY $w0
  %3:_(p0) = G_CONSTANT i64 0
  $x0 = COPY %3(p0)
  $w1 = COPY %0(s32)
  $x2 = COPY %1(p0)
  TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

This

1. Minimizes the length of the live range for `%0:_(s32) = COPY $w0`.
2. Ensures that its live range is fully contained within the live range of `%1:_(p0) = COPY $x1` rather than partially overlapping with it.

This pass only

1. Shuffles the position of copies from physregs within one contiguous range of copies from physregs.
2. Runs on MachineFunctions with a single MachineBasicBlock. (More MBBs -> more likely that there's enough stuff going on that this won't make a difference.)
3. Runs on very small MachineFunctions. (This problem only seems to show up in very small functions where there isn't much going on other than copies.)

The pass will swap the positions of %0 and %1 because

I think there are a lot of places where this could live (CallLowering, the IRTranslator, the existing Localizer, ...). I'm not particularly attached to any of these options, because I think the algorithm would come out being about just the same. Personally, I think this makes sense as a standalone pass because the circumstances in which is should run, and the scope of what it does is distinct enough that it feels like its own thing.

(FWIW, if it's possible and makes sense to teach the register allocator to handle this situation, I'd prefer that solution. However, my gut feeling says that it's probably better to just canonicalize the copies using a pass.)

For now, only add this pass to AArch64.

https://reviews.llvm.org/D87157

Files:
  llvm/include/llvm/CodeGen/GlobalISel/CopyLocalizer.h
  llvm/include/llvm/InitializePasses.h
  llvm/lib/CodeGen/GlobalISel/CMakeLists.txt
  llvm/lib/CodeGen/GlobalISel/CopyLocalizer.cpp
  llvm/lib/CodeGen/GlobalISel/GlobalISel.cpp
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/test/CodeGen/AArch64/GlobalISel/copy-localizer.mir
  llvm/test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll
  llvm/test/CodeGen/AArch64/GlobalISel/integration-shuffle-vector.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D87157.290008.patch
Type: text/x-patch
Size: 24344 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200904/9dc95c98/attachment.bin>