[PATCH] D141135: [RFC][GlobalISel] Replace the current GlobalISel matcher with a bottom-up matcher

Thu Jan 12 13:25:51 PST 2023

dsanders added a comment.

I don't have time for a proper review at the moment as I'm still dealing with a lot of downstream problems but I have a few comments

> The basic idea of a bottom-up matcher is to associate each instruction the set of matching patterns, called the match set. For an instruction without use operands (a leaf) the match set is easily determined. For any other instruction, the match set is retrieved via a table lookup, using the use operands as table indices. As a result, all matching patterns can be found in linear time.

Aside from the 'bottom-up' part, that's pretty consistent with where I wanted to end up. I was pretty worried about the 'bottom-up' part as there's some top-down and middle-out combines that were making better decisions with their visiblity of all the uses after applying the rule but you mentioned that those are still available later on. I definitely don't want to lose that capability

One thing the original matcher was aiming to do was to to merge rules that were very similar but diverged near the root of the match. The intent was that given multiple rules with the same structure and predicates but different opcodes, it could still match the commonality and only diverge at the point it needs to. I guess one concrete example of that is ((x << y) >> y) -> x & M or sext(x, y) depending on whether the shift right is logical or arithmetic. The matchers used by DAGISel and GISel's instruction selector essentially treat these as two fully independent matchers as they test for the G_LSHR/G_ASHR opcodes first, but those rules could share common matching code right up to the last check. Even if we don't make use of it at the moment, it would be good if we can make this matcher change without making a future change along those lines impossible.

At one point I had a draft algorithm for handling multiple mutually exclusive choices based on their relative improvement on the code but the way things have gone over the last couple years I'm never going to find time to implement that. There are some things worth thinking about from it though. Most of those mutually exclusive choices arise from our frequent use of hasOneUse(). Upstream uses it a bit but downstream we use it a lot. This predicate is a proxy for what we really want to be checking which is "will this value become unused after the combine?". Essentially, it's a test that the combine is going to save on work rather than break even or increase it. For example, fadd + fmul -> fma doesn't save anything if the fmul result is needed for something else. It would be good if it were possible for "will be unused" tests to account for more than one applied combine. For example, maybe multiply-add and multiply-sub combine match the same mul so it has two uses both of which will go away after the combines. Ideally we'd apply both rather than blocking them because each rule sees it doesn't eliminate all the uses by itself.
Side note: An unresolved problem in this area is that DBG_VALUE counts as a use but hasOneNonDbgUse() can get a bit expensive.

> Running LLVM test suite with -fglobal-isel on an AArch64 EC2 instance (2 CPUs, 4GB):
> Current implementation: 22m43.224s
> Bottom-up implementation: 22m47.165s

I see Amara asked if for clarification on this and it's just compile time. Was runtime similar too?

> More important, not all patterns would be matched! The reason is that there are C++ matchers which perform a top-down match. One example is the rule fold_global_offset from the AArch64PreLegalizerCombiner. This rule matches a G_GLOBAL_VALUE only, but the C++ matcher looks at the uses of the defined value, which basically means that the matcher reverses the order in which matches are made. As a result, the bottom-up matcher is not aware that it has to apply another rule. I think this case can be fixed by changing the matching pattern. The question is if this is desirable, because it would add restrictions to what a C++ matcher is allowed to do.

I'm certain there's more cases like that. FWIW, we try not to do that in our target but there's occasions where doing it allows better decisions.

> The information required for tests is print in YAML format. However, the output is not yet the same on all platforms. The current solution is to re-assign the encoding in a deterministic way. This code is responsible for the larger part of the increased compile time, and should be removed.

It's not important but I'm a little sad to lose the graphviz output. I found being able to render the matcher in dot pretty useful for the early debugging

> The old matcher generator implementation is still there, but not used.

If the code is no longer reachable then we should remove it. That said, I would expect this change to have a fairly big effect on our tests and perf so we might be glad of a temporary escape route when it reaches our downstream repo to avoid blocking further intake of upstream commits until we've sorted out the consequences on our target. As a result, I'd say the old one should be kept for the moment but be deleted a couple weeks after this.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141135/new/

https://reviews.llvm.org/D141135