[llvm] [RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (PR #108991)

Tue Sep 17 17:57:35 PDT 2024

tobias-stadler wrote:

Currently, AArch64PostLegalizerLowering is needed and not deprecated, but I have started to reevaluate its necessity. Overall, I think it would be best to not use Combiners for lowering. I wanted to start a discussion on this when I have more time, but I can preliminarily dump some of my thoughts here.

As I see it, AArch64PostLegalizerLowering currently contains 2 classes of combines:
1. Lowering: e.g. transforming shuffle vectors to other generic opcodes
2. Canonicalization to offload pattern matching from the InstructionSelector: e.g. swapping operands such that they can be folded into addressing modes without always having to check if either LHS or RHS matches. 

In my opinion, class 1 combines can and should be treated by reselecting generic instructions in the InstructionSelector, because they are only one-shot transformations, so they don't need to and therefore shouldn't happen inside a combiner. Instead these transformations can e.g. be done by creating generic instructions and recursively calling select() on them (which is only legal in all cases since #97670). I am thinking about adding a local worklist to InstructionSelect, so that instead of recursively calling select(), we can just mark instructions for reselection and InstructionSelect takes care of the rest. I have draft patches for this, but I am not sure this is the way to go, because it is only marginally more convenient than recursively calling select().

The problem is that this doesn't work for class 2 of combines, because they need to be fully applied to the entire MIR before ISel runs otherwise this can cause missed folding opportunities. They therefore need to happen inside a combiner. A solution for this would be to just throw these inside PostLegalizerCombiner and live with some O0 code-size regressions (haven't measured impact yet). Alternatively, once we have gMIR TableGen pattern support and ported enough ad-hoc matching code to TableGen, we can let TableGen deal with commutative instructions automagically and remove those combines entirely.

With my recent improvements to the core Combiner algorithm, AArch64PostLegalizerLowering went from ~4.5% to ~2.4% O0 and from ~1.1% to ~0.6% O2 back-end time, so this pass isn't as eye-wateringly expensive as it used to be, but still pretty expensive. (Other O2 times for reference: AArch64PreLegalizerCombiner 4.1%, AArch64PostLegalizerCombiner 1.9%, InstructionSelect: 3.4%).

https://github.com/llvm/llvm-project/pull/108991