[PATCH] D141135: [RFC][GlibalISel] Replace the current GlobalISel matcher with a bottom-up matcher

Fri Jan 6 06:51:59 PST 2023

Kai created this revision.
Herald added subscribers: kosarev, foad, kerbowa, atanasyan, mgrang, jrtc27, hiraditya, kristof.beyls, jvesely, sdardis, arsenm.
Herald added a project: All.
Kai requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

The GICombiner can match an instruction pattern, but in the current implementation this feature is basically
not used. One reason is that the matching capability is somewhat limited. Let's look at some examples.

I can easily match a `G_ICMP`/`G_ZEXT`/`GSUB` sequence:

  (match (G_ICMP $icmp, $cc, $srcb, $srcc),
         (G_ZEXT $zext, $icmp),
         (G_SUB $dst, $srca, $zext))

Just change `G_SUB` to `G_ADD`:

  (match (G_ICMP $icmp, $cc, $srcb, $srcc),
         (G_ZEXT $zext, $icmp),
         (G_ADD $dst, $srca, $zext))

and it does not work as expected, because the current implementation does not handle commutable instructions like the C++ `mi_match()`. In order to get the desired outcome, you have to add another combine rule, just with the `$srca` and `$zext` operand swapped. Obviously, just using C++ code seems easier.

Even more annoying, turning to a simple tree pattern like

  (match (G_AND $srca, $t1, $t2),
         (G_AND $srcb, $t3, $t4),
         (G_OR $dst, $srca, $srcb)) 

just crashes TableGen.

The proposal is to replace the current matcher implementation with a bottom-up matcher.

The basic idea of a bottom-up matcher is to associate each instruction the set of matching patterns, called the match set. For an instruction without use operands (a leaf) the match set is easily determined. For any other instruction, the match set is retrieved via a table lookup, using the use operands as table indices. As a result, all matching patterns can be found in linear time.

Commutable instructions are handled by adding the variant patterns resulting from swapping the subpatterns. For the resulting code, the implication is that only the variable binding varies. The rule code is not duplicated.

This implementation is based in the algorithm from Chase. See https://dl.acm.org/doi/10.1145/41625.41640.
It is a drop-in replacement for the existing matcher, with no changes in the result.

Some numbers from Linux on Z (orig vs change applied):
Compiling LLVM is slightly longer with this change applied: 25m35.471s vs 28m51.797s
The resulting llc binary is slightly smaller: 145.899.472 bytes vs 145.849.704 bytes
I got similar results in Linux on Power.
I have not yet measured the impact of the resulting matcher on the compile time.

One interesting point to note:
The current combiner implementation loops over the basic blocks of a machine functions until a fixpoint is reached. The bottom up matcher matches all patterns, so in theory a single loop of the instructions should be enough. However,  there are 2 reasons why this is not implemented:

- The last loop over the instructions removes trivially dead instructions, which would not happen if only a single loop over the instructions is done. This has no functional impact. The next combiner, or latest the instruction selector removes trivially dead instructions. However, lot of test case would need to be changed because they do not expect those dead instructions. A simple solution would be to make a second loop over the instructions, just removing the trivially dead instructions, which would be faster than doing a full iteration.
- More important, not all patterns would be matched! The reason is that there are C++ matchers which perform a top-down match. One example is the rule `fold_global_offset` from the `AArch64PreLegalizerCombiner`. This rule matches a `G_GLOBAL_VALUE` only, but the C++ matcher looks at the uses of the defined value, which basically means that the matcher reverses the order in which matches are made. As a result, the bottom-up matcher is not aware that it has to apply another rule. I think this case can be fixed by changing the matching pattern. The question is if this is desirable, because it would add restrictions to what a C++ matcher is allowed to do.

There are some open todos/ideas to discuss:

- Measure runtime of the matcher
- The information required for tests is print in YAML format. However, the output is not yet the same on all platforms. The current solution is to re-assign the encoding in a deterministic way. This code is responsible for the larger part of the increased compile time, and should be removed.
- The old matcher generator implementation is still there, but not used.
- Adding the new parameter to `tryCombineAll()` should possibly be a separate PR. I think that introducing a new class `CombinerState`, which is just a thin wrapper around a `DenseMap<MachineInstr *, unsigned>`, would also make sense.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141135

Files:
  llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h
  llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h
  llvm/lib/CodeGen/GlobalISel/Combiner.cpp
  llvm/lib/Target/AArch64/GISel/AArch64O0PreLegalizerCombiner.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
  llvm/lib/Target/Mips/MipsPostLegalizerCombiner.cpp
  llvm/lib/Target/Mips/MipsPreLegalizerCombiner.cpp
  llvm/test/TableGen/GICombinerEmitter/match-tree-automaton-1.td
  llvm/test/TableGen/GICombinerEmitter/match-tree.td
  llvm/test/TableGen/GICombinerEmitter/parse-match-pattern.td
  llvm/utils/TableGen/GICombinerEmitter.cpp
  llvm/utils/TableGen/GlobalISel/CMakeLists.txt
  llvm/utils/TableGen/GlobalISel/GIMatchTreeAutomaton.cpp
  llvm/utils/TableGen/GlobalISel/GIMatchTreeAutomaton.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D141135.486845.patch
Type: text/x-patch
Size: 113848 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230106/d2b84336/attachment-0001.bin>