[PATCH] D29711: [globalisel] Sort RuleMatchers by priority.

Thu Feb 23 09:52:19 PST 2017

ab accepted this revision.
ab added a comment.

I suppose it's always easier to iterate in-tree, and, setting aside the tricky bits, this is useful; LGTM

================
Comment at: utils/TableGen/GlobalISelEmitter.cpp:224-230
+    // Wider types should have priority over narrower types within the same
+    // family but the order of the families doesn't matter. So s64 is more
+    // important than s32 and f64 is more important than f32 but s64 and f64 are
+    // unordered.
+    // FIXME: Don't do a lexical comparison of the type name. Check the size
+    //        instead. This only works for the cases we emit so far and is only
+    //        done this way because we can't delve into LLT's yet.
----------------
dsanders wrote:
> kristof.beyls wrote:
> > dsanders wrote:
> > > kristof.beyls wrote:
> > > > ab wrote:
> > > > > rovka wrote:
> > > > > > dsanders wrote:
> > > > > > > ab wrote:
> > > > > > > > dsanders wrote:
> > > > > > > > > ab wrote:
> > > > > > > > > > Huh, how can this be a problem?
> > > > > > > > > It works fine for s32 and s64 since "s32" < "s64" but the overall ordering is wrong when the number of digits vary:
> > > > > > > > >   "s128" < "s16" < "s32" < "s64" < "s8"
> > > > > > > > > Once I've dealt with the LLT layering issue we'll be able to do a numeric comparison for the size such that the order is:
> > > > > > > > >   s8 < s16 < s32 < s64 < s128
> > > > > > > > Sorry; I'm asking why does the ordering matter in the first place?
> > > > > > > Ah ok. Thinking about it again, I'm not sure it does. My original thinking was partly based on a comment from D26878 that talked about the order being correct by luck but we don't have the widenable concept at the moment. The other part of the thinking was based on the order I've seen the matcher tables be emitted but I think that order stems more from additional Predicates<> than from the type.
> > > > > > Ha, I had the same comment in mind!
> > > > > Ah yeah you're right, if we want to reinstate the isWidenable logic, we'd need this ordering.  But one of the reasons why I'm more and more convinced it's a bad idea is precisely the need for an ordering:  I have a gut feeling the patterns should never require that for correctness, letting us order them as we please for performance.  In which case, the types should never be a distinguishing factor (as we're only looking at performance indicators for breaking "ties" between "equivalent" patterns).  Does that sound right?  If so, I guess the type should just be ignored entirely.
> > > > I'm trying to follow the discussion here, but find it a bit hard to understand in depth.
> > > > I'm wondering if priority of patterns/rules is written up somewhere already?
> > > > (This comment seems to suggest it's still very much an in-progress design discussion, so isn't written up anywhere?)
> > > > I'm assuming the priority of which pattern to try and match first will remain the same as under DAGISel? Or do we consider making some tweaks?
> > > > If it remains the same, what further ordering rules are being discussed here, and why does more ordering need to be defined compared to DAGISel?
> > > > 
> > > > Please don't let my comment slow you down in getting to a solution, but I think it'd be useful for this area to be explained a bit better in the end somewhere, e.g. in the comment at the top of this file.
> > > > Doesn't necessarily need to be as part of this patch.
> > > > Ah yeah you're right, if we want to reinstate the isWidenable logic, we'd need this
> > > > ordering. But one of the reasons why I'm more and more convinced it's a bad idea is
> > > > precisely the need for an ordering: I have a gut feeling the patterns should never
> > > > require that for correctness, letting us order them as we please for performance. In
> > > > which case, the types should never be a distinguishing factor (as we're only looking at
> > > > performance indicators for breaking "ties" between "equivalent" patterns). Does that
> > > > sound right? If so, I guess the type should just be ignored entirely.
> > > 
> > > I think a certain amount of ordering is unavoidable because we have a mixture of special cases that overlap with general cases. Consider these two patterns:
> > >   (set GPR32:$rd, (add GPR32:$rs1, GPR32:$rs2))
> > >   (set GPR32:$rd, (add (mul GPR32:$rs1, GPR32:$rs2), GPR32:$rs3))
> > > If we check them in the order I've written them then we can never match a multiply-accumulate but the other way around will always match the multiply accumulate when it's possible to do so. An alternative is to pick the best match at run time but that prevents early exits from the matcher.
> > > 
> > > Another example is:
> > >   (set GPR32:$rd, (add GPR32:$rs1, uimm16:$imm))
> > >   (set GPR32:$rd, (add GPR32:$rs1, uimm8:$imm))
> > > On a variable-length instruction set, there may be code-size and/or performance gains from preferring the uimm8 version.
> > > 
> > > > I'm trying to follow the discussion here, but find it a bit hard to understand in depth.
> > > > I'm wondering if priority of patterns/rules is written up somewhere already?
> > > > (This comment seems to suggest it's still very much an in-progress design discussion, so
> > > > isn't written up anywhere?)
> > > 
> > > That's right, we haven't decided what kind of ordering GlobalISel really needs yet. At the moment I'm trying to stick to the minimum needed for the matcher to work correctly. By the time we start to optimise the generated matcher I'm hoping to have a better idea of what the ordering issues are.
> > > 
> > > For SelectionDAG it's defined in PatternSortingPredicate from utils/TableGen/DAGISelEmitter.cpp.
> > > 
> > > > I'm assuming the priority of which pattern to try and match first will remain the same
> > > > as under DAGISel? Or do we consider making some tweaks?
> > > 
> > > Maybe. Ideally, the imported rules have the same overall behaviour as they do in SelectionDAG so that we produce the same or better code out of the box but I'm not sure that's possible due to the differences in representation in GlobalISel. For example, LLT can't distinguish floating point from integer like MVT can so some of SelectionDAG's ordering rules cannot be transferred to GlobalISel.
> > Brilliant explanation, thanks! Much appreciated!
> I've dropped this override for now since the lack of Widenable means we don't really need it yet.
> I think a certain amount of ordering is unavoidable because we have a mixture of special cases that overlap with general cases. Consider these two patterns:
> 
> (set GPR32:$rd, (add GPR32:$rs1, GPR32:$rs2))
> (set GPR32:$rd, (add (mul GPR32:$rs1, GPR32:$rs2), GPR32:$rs3))
> If we check them in the order I've written them then we can never match a multiply-accumulate but the other way around will always match the multiply accumulate when it's possible to do so. An alternative is to pick the best match at run time but that prevents early exits from the matcher.

Right, that's what I mean by "performance": in this case, ignoring the second pattern should not cause any selection failure: we never *have to* obey any ordering.

================
Comment at: utils/TableGen/GlobalISelEmitter.cpp:536
+  /// Returns true if this object is more important than B.
+  bool isHigherPriorityThan(const RuleMatcher &B) const {
+    // Rules involving more match roots have higher priority.
----------------
dsanders wrote:
> aditya_nandakumar wrote:
> > aditya_nandakumar wrote:
> > > ab wrote:
> > > > This should maybe take into account the AddedComplexity somehow, but that's not really compatible with the isHigherPriorityThan scheme :/
> > > > 
> > > > I'm not sure using AddedComplexity is a good idea in the first place: it's very magic-number-y and totally SDAG implementation specific.  Still, care has been put into that, so if there's any value we can easily extract from that, let's do it.
> > > > 
> > > > Maybe instead sum up the "complexity" of each Matcher?  We'd want to make sure to prioritize significant metrics like more instructions over operand kinds and such.
> > > > 
> > > > Assuming you don't have another usage in mind for the *Kind enums, that might let you simplify that to virtual methods in each Matcher returning its cost.
> > > Our backend has scripts that takes into account ISA information and auto generates various DAGPatterns and also calculates the added complexity for each pattern and emits them (not in sorted order by complexity).
> > > If we try to reprioritize patterns and come up with newer metrics/aggregates for matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing about costs).
> > > 
> > Perhaps there needs to be a "-respect-added-complexity" option so backends can rely on GlobalISel not re-ordering patterns it thinks is better?
> > This should maybe take into account the AddedComplexity somehow, but that's not
> > really compatible with the isHigherPriorityThan scheme :/
> 
> This isn't really blocked by isHigherPriorityThan(). One of the higher level objects can always weigh complexity into its result.
> 
> > I'm not sure using AddedComplexity is a good idea in the first place: it's very magic
> > number-y and totally SDAG implementation specific. Still, care has been put into that,
> > so if there's any value we can easily extract from that, let's do it.
> 
> I'd prefer not to add AddedComplexity for the same reasons. If possible, I'd prefer a more intuitive metric but I'm not sure what that metric would be yet.
> 
> I did some digging today and quite a few of the cases where AddedComplexity affects the order (determined by checking if the less-than operator gave different results with and without the AddedComplexity adjustment during std::sort) would be unaffected by the removal of AddedComplexity if sorting based on the root operator replaced it. For example:
>   (ld:f16 (am_indexed16:iPTR GPR64sp:i64:$Rn, uimm12s2:i64:$offset))<<P:Predicate_unindexedload>><<P:Predicate_load>>
>   (fadd:f32 (vector_extract:f32 FPR128:v4f32:$Rn, 1:i64), (vector_extract:f32 FPR128:v4f32:$Rn, 0:i64))
>    ## 23 (10 added) vs 19 (0 added)
> 
> Once you exclude those obvious cases, there's 58 checks of which 26 are of the form:
>   (add:i32 addsub_shifted_imm32:i32:$imm, GPR32sp:i32:$Rn)
>   (add:i64 (mul:i64 (zext:i64 GPR32:i32:$Rn), (imm:i64)<<P:Predicate_i64imm_32bit>>:$C), GPR64:i64:$Ra)
>    ## 18 (6 added) vs 18(5 added) ##
> and are separated by the tree structure. In this example, the addsub_shifted_imm32 one gets priority in the end.
> 
> The other 32 are of the form:
>   (add:i32 arith_shifted_reg32:i32:$Rm, GPR32:i32:$Rn)
>   (add:i64 neg_addsub_shifted_imm64:i64:$imm, GPR64:i64:$Rn)
>    ## 12 (0 added) vs 13(1 added) ##
>   (st GPR64:i64:$Rt, (am_indexed8:iPTR GPR64sp:i64:$Rn, uimm12s1:i64:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_truncstore>><<P:Predicate_truncstorei8>>
>   (st FPR128:v16i8:$Rt, (am_indexed7s64:iPTR GPR64sp:i64:$Rn, simm7s8:i32:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_nontemporalstore>>
>    ## 23 (10 added) vs 28(15 added) ##
> and will have a direct effect on the output when we can import them all.
> 
> > Maybe instead sum up the "complexity" of each Matcher? We'd want to make sure to
> > prioritize significant metrics like more instructions over operand kinds and such.
> >
> > Assuming you don't have another usage in mind for the *Kind enums, that might let
> > you simplify that to virtual methods in each Matcher returning its cost.
> >
> > Our backend has scripts that takes into account ISA information and auto generates 
> > various DAGPatterns and also calculates the added complexity for each pattern and emits 
> > them (not in sorted order by complexity).
> 
> As far as I can tell, the AddedComplexity adjustment is a constant that's added in certain manually-chosen cases. We'll need to take this discussion off-list to be able to talk details.
> 
> > If we try to reprioritize patterns and come up with newer metrics/aggregates for
> > matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing
> > about costs).
> 
> I agree, the aim is to avoid re-prioritizing the patterns if at all possible. However, SelectionDAG's metrics are based on the SDNode and MVT representations that do not exist in GlobalISel. Purely GlobalISel rules will not be able integrate with this scheme. 
> 
> I think we should aim to use something that makes sense for GlobalISel so that we have one ordering that applies to all GlobalISel rules regardless of whether they were imported from SelectionDAG or not. My hope is that the order resulting from a GlobalISel-based scheme will be functionally the same, or at least very close to the SelectionDAG one.
> Once you exclude those obvious cases, there's 58 checks of which 26 are of the form:
> 
> ```
> (add:i32 addsub_shifted_imm32:i32:$imm, GPR32sp:i32:$Rn)
> (add:i64 (mul:i64 (zext:i64 GPR32:i32:$Rn), (imm:i64)<<P:Predicate_i64imm_32bit>>:$C), GPR64:i64:$Ra)
>  ## 18 (6 added) vs 18(5 added) ##
> ```
> 
> and are separated by the tree structure. In this example, the addsub_shifted_imm32 one gets priority in the end.

This sounds like the interesting pattern that will need some form of manual override.

> 
> ```
> (add:i32 arith_shifted_reg32:i32:$Rm, GPR32:i32:$Rn)
> (add:i64 neg_addsub_shifted_imm64:i64:$imm, GPR64:i64:$Rn)
>  ## 12 (0 added) vs 13(1 added) ##
> (st GPR64:i64:$Rt, (am_indexed8:iPTR GPR64sp:i64:$Rn, uimm12s1:i64:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_truncstore>><<P:Predicate_truncstorei8>>
> (st FPR128:v16i8:$Rt, (am_indexed7s64:iPTR GPR64sp:i64:$Rn, simm7s8:i32:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_nontemporalstore>>
>  ## 23 (10 added) vs 28(15 added) ##
> ```
> and will have a direct effect on the output when we can import them all.

Huh, why is there an effect?  Aren't these incompatible because of the types?  Or am I misreading this?

https://reviews.llvm.org/D29711