[PATCH] D110339: SelectionDAGBuilder: Improve canonicalization by not swapping branch targets
Matthias Braun via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 23 08:53:40 PDT 2021
MatzeB created this revision.
MatzeB added reviewers: bogner, craig.topper, rnk, RKSimon, Jiangning, mcrosier, lebedev.ri, efriedma.
Herald added subscribers: foad, frasercrmck, wenlei, kerbowa, luismarques, apazos, sameer.abuasal, steven.zhang, pengfei, s.egerton, Jim, jocewei, PkmX, arphaman, the_o, brucehoult, MartinMosbeck, rogfer01, atanasyan, edward-jones, zzheng, jrtc27, niosHD, sabuasal, sunfish, simoncook, johnrusso, rbar, asb, fedor.sergeev, kbarton, hiraditya, kristof.beyls, jgravelle-google, arichardson, sbc100, nhaehnle, jvesely, nemanjai, sdardis, dylanmckay, jyknight, dschuff, arsenm, qcolombet, jholewinski.
MatzeB requested review of this revision.
Herald added subscribers: llvm-commits, MaskRay, aheejin.
Herald added a project: LLVM.
Introduces a TargetLowerig::optimizeFallthroughsEarly() switch to
stop SelectionDAG flipping jump targets to create fallthroughs at
early if desired.
Motivation
----------
In most targets comparison instructions produce results for multiple
relations, so a program like `if (x == C) { ... } else if (x > C) { ... }`
can use a single compare instructions for 2 conditional branches. This pattern
is widespread in one of our systems (a decref operation with support for
immortal objects).
Machine-CSE would not consistently trigger, because:
SelectionDAG would often optimizes for fallthroughs,
flip true/false and negate the condition:
x > C
--> !(x > C) with true/false branches flipped
--> X <= C
--> X < (C + 1) canonicalization
So we end up with a different constant inhibiting CSE!
However with MachineBlockPlacement optimzing control flow
and fallthroughs anyway there is no real benefit from
SelectionDAG optimizing early when it negatively affects
canonicalization.
Target Specifics
----------------
I did not apply the change to most targets:
- ARM: IfConverter pass currently relies on fallthroughs existing
- AArch64: Has a custom AArch64ConditionOptimizer pass to CSE comparison. Strangely the unit tests do not show consistent behavior with and without these changes. Glancing at the code I wonder if the algorithm only checks control flow among the "true" direction of a conditional branch and misses opportunities in the "false" direction.
- PowerPC+Mips: It seems in some tests `not` instructions are inserted when optimizing for fallthrough instead of flipping true/false.
- BPF+WebAssembly: The TargetInstrInfo::analyzeBranch implementation appears to be incomplete so MachineBlockPlacement cannot optimize for fallthroughs.
- Other targets: Test regressions that I did not understand.
The changes are enabled for: X86, RISCV, Lanai, XCore and by default
for new targets.
Test Changes
------------
There is a lot of test churn. Most is benign because of different
canonicalization (which in fact makes it more likely to see the same
constant in LLVM-IR and assembly now).
There are a number of tests containing unoptimized conditional branches
jumping on true/false, getting better or worse. But those are just
artifacts of bugpoint reduction or manual construction and
won't happen in practice after InstCombine:
est/CodeGen/X86/callbr-asm-blockplacement.ll
test/CodeGen/X86/pr29170.ll
test/CodeGen/X86/pr46585.ll
test/CodeGen/X86/setcc-freeze.ll
test/CodeGen/X86/2008-04-17-CoalescerBug.ll
test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
test/CodeGen/X86/2007-03-01-SpillerCrash.ll
test/CodeGen/X86/2007-12-18-LoadCSEBug.ll
test/CodeGen/X86/legalize-shift-64.ll
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/X86/setcc-freeze.ll
test/CodeGen/X86/shrink-compare-pgso.ll
These tests show improved codegen because duplicated CMPs get
eliminated:
test/CodeGen/X86/speculative-load-hardening-indirect.ll
test/CodeGen/X86/switch-density.ll
test/CodeGen/X86/cse-cmp.ll (new test)
These appear to improve because of different normalization:
test/CodeGen/X86/cmp-bool.ll
test/CodeGen/X86/fp128-select.ll
test/CodeGen/X86/xmulo.ll
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110339
Files:
llvm/include/llvm/CodeGen/TargetLowering.h
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
llvm/lib/Target/ARM/ARMISelLowering.h
llvm/lib/Target/AVR/AVRISelLowering.h
llvm/lib/Target/BPF/BPFISelLowering.cpp
llvm/lib/Target/BPF/BPFISelLowering.h
llvm/lib/Target/Hexagon/HexagonISelLowering.h
llvm/lib/Target/Mips/MipsISelLowering.cpp
llvm/lib/Target/Mips/MipsISelLowering.h
llvm/lib/Target/NVPTX/NVPTXISelLowering.h
llvm/lib/Target/PowerPC/PPCISelLowering.h
llvm/lib/Target/Sparc/SparcISelLowering.cpp
llvm/lib/Target/Sparc/SparcISelLowering.h
llvm/lib/Target/SystemZ/SystemZISelLowering.h
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
llvm/test/CodeGen/Lanai/sub-cmp-peephole.ll
llvm/test/CodeGen/RISCV/codemodel-lowering.ll
llvm/test/CodeGen/RISCV/double-br-fcmp.ll
llvm/test/CodeGen/RISCV/double-previous-failure.ll
llvm/test/CodeGen/RISCV/float-br-fcmp.ll
llvm/test/CodeGen/RISCV/half-br-fcmp.ll
llvm/test/CodeGen/RISCV/jumptable.ll
llvm/test/CodeGen/RISCV/shrinkwrap.ll
llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll
llvm/test/CodeGen/X86/2006-11-17-IllegalMove.ll
llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
llvm/test/CodeGen/X86/2007-03-01-SpillerCrash.ll
llvm/test/CodeGen/X86/2007-12-18-LoadCSEBug.ll
llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll
llvm/test/CodeGen/X86/2008-04-17-CoalescerBug.ll
llvm/test/CodeGen/X86/2008-05-21-CoalescerBug.ll
llvm/test/CodeGen/X86/2012-08-17-legalizer-crash.ll
llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
llvm/test/CodeGen/X86/AMX/amx-lower-tile-copy.ll
llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll
llvm/test/CodeGen/X86/atomic-unordered.ll
llvm/test/CodeGen/X86/atomic64.ll
llvm/test/CodeGen/X86/atomic6432.ll
llvm/test/CodeGen/X86/avoid-sfb.ll
llvm/test/CodeGen/X86/avx-cmp.ll
llvm/test/CodeGen/X86/avx-load-store.ll
llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll
llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll
llvm/test/CodeGen/X86/cmp-bool.ll
llvm/test/CodeGen/X86/cmpxchg-clobber-flags.ll
llvm/test/CodeGen/X86/copy-eflags.ll
llvm/test/CodeGen/X86/cse-cmp.ll
llvm/test/CodeGen/X86/fp128-cast.ll
llvm/test/CodeGen/X86/fp128-select.ll
llvm/test/CodeGen/X86/jump_sign.ll
llvm/test/CodeGen/X86/lea.ll
llvm/test/CodeGen/X86/legalize-shift-64.ll
llvm/test/CodeGen/X86/loop-blocks.ll
llvm/test/CodeGen/X86/lvi-hardening-gadget-graph.ll
llvm/test/CodeGen/X86/machine-cse.ll
llvm/test/CodeGen/X86/masked_compressstore.ll
llvm/test/CodeGen/X86/masked_expandload.ll
llvm/test/CodeGen/X86/masked_load.ll
llvm/test/CodeGen/X86/masked_store.ll
llvm/test/CodeGen/X86/masked_store_trunc.ll
llvm/test/CodeGen/X86/masked_store_trunc_ssat.ll
llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
llvm/test/CodeGen/X86/musttail-varargs.ll
llvm/test/CodeGen/X86/nobt.ll
llvm/test/CodeGen/X86/optimize-max-0.ll
llvm/test/CodeGen/X86/or-branch.ll
llvm/test/CodeGen/X86/peephole-na-phys-copy-folding.ll
llvm/test/CodeGen/X86/pr29170.ll
llvm/test/CodeGen/X86/pr36602.ll
llvm/test/CodeGen/X86/pr38217.ll
llvm/test/CodeGen/X86/pr38795.ll
llvm/test/CodeGen/X86/pr46585.ll
llvm/test/CodeGen/X86/pr46827.ll
llvm/test/CodeGen/X86/pr50254.ll
llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll
llvm/test/CodeGen/X86/reverse_branches.ll
llvm/test/CodeGen/X86/select.ll
llvm/test/CodeGen/X86/setcc-freeze.ll
llvm/test/CodeGen/X86/shrink-compare-pgso.ll
llvm/test/CodeGen/X86/shrink-compare.ll
llvm/test/CodeGen/X86/sibcall.ll
llvm/test/CodeGen/X86/slow-incdec.ll
llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll
llvm/test/CodeGen/X86/speculative-load-hardening-indirect.ll
llvm/test/CodeGen/X86/statepoint-ra.ll
llvm/test/CodeGen/X86/statepoint-vreg-details.ll
llvm/test/CodeGen/X86/statepoint-vreg-invoke.ll
llvm/test/CodeGen/X86/switch-bt.ll
llvm/test/CodeGen/X86/switch-density.ll
llvm/test/CodeGen/X86/switch-lower-peel-top-case.ll
llvm/test/CodeGen/X86/switch.ll
llvm/test/CodeGen/X86/tail-dup-asm-goto.ll
llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
llvm/test/CodeGen/X86/tail-opts.ll
llvm/test/CodeGen/X86/vector-shift-by-select-loop.ll
llvm/test/CodeGen/X86/widen_cast-1.ll
llvm/test/CodeGen/X86/widen_cast-2.ll
llvm/test/CodeGen/X86/x32-va_start.ll
llvm/test/CodeGen/X86/x86-repmov-copy-eflags.ll
llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
llvm/test/CodeGen/X86/xmulo.ll
llvm/test/CodeGen/X86/xor-combine-debugloc.ll
llvm/test/CodeGen/X86/xor-icmp.ll
llvm/test/CodeGen/XCore/threads.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110339.374571.patch
Type: text/x-patch
Size: 153833 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210923/cca9dbe2/attachment-0001.bin>
More information about the llvm-commits
mailing list