[PATCH] D30751: [MachineCopyForwarding] Add new pass to do register COPY forwarding at end of register allocation.

Mon Apr 10 13:38:07 PDT 2017

gberry updated this revision to Diff 94733.
gberry added a comment.

I've taken a new approach with this change: extending the existing
MachineCopyPropagation pass instead of making a new pass.  This makes
the patch quite a bit simpler at the expense of making
MachineCopyPropagation a little more complicated (by having two
modes).

There are two AMDGPU lit test cases that I'm not sure about (marked
with XXXGCB) that I would appreciate someone more familiar with that
target to make sure they are reasonable.

To answer Quentin's original questions/comments:

- I have at least one example of an OoO core that does benefit from this change (and specifically benefits even if no COPYs are removed, only forwarded).

- I did some more investigating into why there are COPYs that can be forwarded/removed just after register allocation at all and the case that came up every time I looked deeper was COPYs that were inserted during RegAlloc Greedy (presumably as part of live range splitting?) that looked something like this (from aarch64 MultiSource/Benchmarks/MiBench/consumer-jpeg/jdphuff.c:decode_mcu_AC_refine)
  1. After Greedy Register Allocator: 9008B	BB#62: derived from LLVM BB %if.end169 	    Predecessors according to CFG: BB#45 BB#93 9056B		%vreg236:sub_32<def,read-undef> = SUBWrr %vreg236:sub_32, %vreg46; GPR64common:%vreg236 GPR32common:%vreg46 9104B		%vreg215<def> = ASRVXr %vreg43, %vreg236; GPR64:%vreg215,%vreg43 GPR64common:%vreg236 9128B		%vreg426<def> = COPY %vreg425; GPR32common:%vreg426,%vreg425 9136B		%vreg217<def> = SUBWri %vreg426, 1, 0; GPR32common:%vreg217,%vreg426 9152B		%vreg218<def> = ANDWrr %vreg217, %vreg215:sub_32; GPR32:%vreg218 GPR32common:%vreg217 GPR64:%vreg215 9168B		%vreg426<def> = ADDWrr %vreg218, %vreg426; GPR32common:%vreg426 GPR32:%vreg218 9200B		CBZW %vreg426, <BB#63>; GPR32common:%vreg426

  	    Successors according to CFG: BB#63(0x30000000 / 0x80000000 = 37.50%) BB#94(0x50000000 / 0x80000000 = 62.50%)

    Where the COPY added had a small live range and did not end up
    getting allocated in such a way that the COPY was a NOP
    (i.e. %vreg426 was assigned a different register than %vreg425).

https://reviews.llvm.org/D30751

Files:
  include/llvm/CodeGen/Passes.h
  include/llvm/InitializePasses.h
  lib/CodeGen/CodeGen.cpp
  lib/CodeGen/MachineCopyPropagation.cpp
  lib/CodeGen/TargetPassConfig.cpp
  test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
  test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
  test/CodeGen/AArch64/f16-instructions.ll
  test/CodeGen/AArch64/flags-multiuse.ll
  test/CodeGen/AArch64/merge-store-dependency.ll
  test/CodeGen/AArch64/neg-imm.ll
  test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size.ll
  test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
  test/CodeGen/AMDGPU/multilevel-break.ll
  test/CodeGen/AMDGPU/private-access-no-objects.ll
  test/CodeGen/AMDGPU/ret.ll
  test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll
  test/CodeGen/ARM/atomic-op.ll
  test/CodeGen/ARM/swifterror.ll
  test/CodeGen/PowerPC/fma-mutate.ll
  test/CodeGen/PowerPC/inlineasm-i64-reg.ll
  test/CodeGen/PowerPC/tail-dup-layout.ll
  test/CodeGen/SPARC/32abi.ll
  test/CodeGen/SPARC/atomics.ll
  test/CodeGen/SPARC/inlineasm.ll
  test/CodeGen/Thumb/thumb-shrink-wrapping.ll
  test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
  test/CodeGen/X86/arg-copy-elide.ll
  test/CodeGen/X86/avx512-bugfix-25270.ll
  test/CodeGen/X86/avx512-calling-conv.ll
  test/CodeGen/X86/buildvec-insertvec.ll
  test/CodeGen/X86/combine-fcopysign.ll
  test/CodeGen/X86/complex-fastmath.ll
  test/CodeGen/X86/divide-by-constant.ll
  test/CodeGen/X86/fmaxnum.ll
  test/CodeGen/X86/fminnum.ll
  test/CodeGen/X86/fp128-i128.ll
  test/CodeGen/X86/haddsub-2.ll
  test/CodeGen/X86/haddsub-undef.ll
  test/CodeGen/X86/inline-asm-fpstack.ll
  test/CodeGen/X86/ipra-local-linkage.ll
  test/CodeGen/X86/localescape.ll
  test/CodeGen/X86/mul-i1024.ll
  test/CodeGen/X86/mul-i512.ll
  test/CodeGen/X86/mul128.ll
  test/CodeGen/X86/pmul.ll
  test/CodeGen/X86/powi.ll
  test/CodeGen/X86/pr11334.ll
  test/CodeGen/X86/pr29112.ll
  test/CodeGen/X86/select.ll
  test/CodeGen/X86/shrink-wrap-chkstk.ll
  test/CodeGen/X86/sqrt-fastmath.ll
  test/CodeGen/X86/sse-scalar-fp-arith.ll
  test/CodeGen/X86/sse1.ll
  test/CodeGen/X86/sse3-avx-addsub-2.ll
  test/CodeGen/X86/statepoint-live-in.ll
  test/CodeGen/X86/statepoint-stack-usage.ll
  test/CodeGen/X86/vec_fp_to_int.ll
  test/CodeGen/X86/vec_int_to_fp.ll
  test/CodeGen/X86/vec_minmax_sint.ll
  test/CodeGen/X86/vec_shift4.ll
  test/CodeGen/X86/vector-blend.ll
  test/CodeGen/X86/vector-idiv-sdiv-128.ll
  test/CodeGen/X86/vector-idiv-udiv-128.ll
  test/CodeGen/X86/vector-rotate-128.ll
  test/CodeGen/X86/vector-sext.ll
  test/CodeGen/X86/vector-shift-ashr-128.ll
  test/CodeGen/X86/vector-shift-lshr-128.ll
  test/CodeGen/X86/vector-shift-shl-128.ll
  test/CodeGen/X86/vector-shuffle-combining.ll
  test/CodeGen/X86/vector-trunc-math.ll
  test/CodeGen/X86/vector-zext.ll
  test/CodeGen/X86/vselect-minmax.ll
  test/CodeGen/X86/widen_conv-3.ll
  test/CodeGen/X86/widen_conv-4.ll
  test/CodeGen/X86/x86-shrink-wrap-unwind.ll
  test/CodeGen/X86/x86-shrink-wrapping.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D30751.94733.patch
Type: text/x-patch
Size: 130845 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170410/457e34b0/attachment.bin>