[PATCH] D123394: [CodeGen] Late cleanup of redundant address/immediate definitions.

Mon May 16 12:17:26 PDT 2022

jonpa updated this revision to Diff 429800.
jonpa added a comment.

NFC updates.

- Compile time improved: The regression I saw was remedied with a better iteration strategy over the MBBs. I tried to find further improvements, like tracking kill flags in a map instead of doing the search backwards over instructions, but that was slower. I also tried to use DenseMap for the mapping of Register -> MachineInstr*. Even the best init() alloc with a value of 8 was not better than std::map. IndexedMap did not seem like a good choice given that the typical number of mapped registers are not that many. It looks now like the Wall time for this pass on SPEC/SystemZ is on average 0.6%, much like Machine Copy Propagation Pass #2. There are no big compile time blowups that I am aware of.

I can't find much more to improve, at least not at the moment. This pass is "better than nothing" but it is not impossible that there could be more elegant/powerful solutions like emitting frame address anchors in an intelligent way, cleaning up rematerialized immedate-loads somehow, and maybe other things. Personally, I think this looks pretty good to have as long as there is no better way...

However, if we were to actually go ahead and use this, I think there should be some kind of broader agreement on this. There is no strong SystemZ benefit from this - it was just something that seemed missing (looking at those depressing multiple identical load adress instructions). The most benefit is probably on in-order targets. I think there should at least be two or three targets that "vote for" this. It's good to know that the test changes on e.g. RISC-V look correct, and now the next step would be for those targets to evaluate if it worth adding to the pass pipeline.

@nemanjai My previous idea to try post-RA pseudos for immediates on SystemZ is actually not very useful as it is only 64-bit immediates that require two instructions (very rare). But it still looks like this could be worth trying on e.g. PowerPC per your example above. Maybe you could give that a try (like in FrameLowering emitting a pseduo instruction instead of two actual instructions which is expanded in ExpandPostRAPseudos)? If you think that looks good, it would be great to know that...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123394/new/

https://reviews.llvm.org/D123394

Files:
  llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
  llvm/include/llvm/CodeGen/MachinePassRegistry.def
  llvm/include/llvm/CodeGen/Passes.h
  llvm/include/llvm/InitializePasses.h
  llvm/lib/CodeGen/CMakeLists.txt
  llvm/lib/CodeGen/CodeGen.cpp
  llvm/lib/CodeGen/RedundantImmLoadsCleanup.cpp
  llvm/lib/CodeGen/TargetPassConfig.cpp
  llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
  llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/stack-guard-remat-bitcast.ll
  llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll
  llvm/test/CodeGen/AMDGPU/cc-update.ll
  llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
  llvm/test/CodeGen/AMDGPU/flat-scratch.ll
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
  llvm/test/CodeGen/AMDGPU/multilevel-break.ll
  llvm/test/CodeGen/AMDGPU/si-annotate-cf.ll
  llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll
  llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
  llvm/test/CodeGen/ARM/O3-pipeline.ll
  llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
  llvm/test/CodeGen/ARM/fpclamptosat.ll
  llvm/test/CodeGen/ARM/ifcvt-branch-weight-bug.ll
  llvm/test/CodeGen/ARM/jump-table-islands.ll
  llvm/test/CodeGen/ARM/reg_sequence.ll
  llvm/test/CodeGen/BPF/objdump_cond_op_2.ll
  llvm/test/CodeGen/Mips/llvm-ir/lshr.ll
  llvm/test/CodeGen/Mips/llvm-ir/shl.ll
  llvm/test/CodeGen/PowerPC/O3-pipeline.ll
  llvm/test/CodeGen/PowerPC/cgp-select.ll
  llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
  llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
  llvm/test/CodeGen/RISCV/O3-pipeline.ll
  llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
  llvm/test/CodeGen/RISCV/rvv/bitreverse-sdnode.ll
  llvm/test/CodeGen/RISCV/rvv/bswap-sdnode.ll
  llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
  llvm/test/CodeGen/RISCV/rvv/ctpop-sdnode.ll
  llvm/test/CodeGen/RISCV/rvv/cttz-sdnode.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwadd.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwmul.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwsub.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsub.ll
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll
  llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll
  llvm/test/CodeGen/RISCV/rvv/rv32-spill-zvlsseg.ll
  llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector.ll
  llvm/test/CodeGen/RISCV/rvv/rv64-spill-zvlsseg.ll
  llvm/test/CodeGen/RISCV/rvv/stepvector.ll
  llvm/test/CodeGen/RISCV/rvv/zvlsseg-spill.mir
  llvm/test/CodeGen/RISCV/stack-realignment.ll
  llvm/test/CodeGen/SystemZ/frame-28.mir
  llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll
  llvm/test/CodeGen/Thumb2/mve-vst4.ll
  llvm/test/CodeGen/X86/2008-04-09-BranchFolding.ll
  llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll
  llvm/test/CodeGen/X86/fast-isel-stackcheck.ll
  llvm/test/CodeGen/X86/fshl.ll
  llvm/test/CodeGen/X86/masked_load.ll
  llvm/test/CodeGen/X86/oddshuffles.ll
  llvm/test/CodeGen/X86/opt-pipeline.ll
  llvm/test/CodeGen/X86/popcnt.ll
  llvm/test/CodeGen/X86/sdiv_fix_sat.ll
  llvm/test/CodeGen/X86/shift-i128.ll
  llvm/test/CodeGen/X86/vec_shift5.ll
  llvm/test/CodeGen/XCore/scavenging.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D123394.429800.patch
Type: text/x-patch
Size: 109089 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220516/f4957c0c/attachment.bin>