[PATCH] D29711: [globalisel] Sort RuleMatchers by priority.

Mon Feb 13 08:03:54 PST 2017

dsanders added a comment.

Would it be possible to deal with the smaller changes to the sort order in this patch and come back to the bigger issues in a later one? I'd like to unblock the other patches and I'll have a clearer picture of the bigger issues once there are more importable rules.

================
Comment at: utils/TableGen/GlobalISelEmitter.cpp:536
+  /// Returns true if this object is more important than B.
+  bool isHigherPriorityThan(const RuleMatcher &B) const {
+    // Rules involving more match roots have higher priority.
----------------
aditya_nandakumar wrote:
> aditya_nandakumar wrote:
> > ab wrote:
> > > This should maybe take into account the AddedComplexity somehow, but that's not really compatible with the isHigherPriorityThan scheme :/
> > > 
> > > I'm not sure using AddedComplexity is a good idea in the first place: it's very magic-number-y and totally SDAG implementation specific.  Still, care has been put into that, so if there's any value we can easily extract from that, let's do it.
> > > 
> > > Maybe instead sum up the "complexity" of each Matcher?  We'd want to make sure to prioritize significant metrics like more instructions over operand kinds and such.
> > > 
> > > Assuming you don't have another usage in mind for the *Kind enums, that might let you simplify that to virtual methods in each Matcher returning its cost.
> > Our backend has scripts that takes into account ISA information and auto generates various DAGPatterns and also calculates the added complexity for each pattern and emits them (not in sorted order by complexity).
> > If we try to reprioritize patterns and come up with newer metrics/aggregates for matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing about costs).
> > 
> Perhaps there needs to be a "-respect-added-complexity" option so backends can rely on GlobalISel not re-ordering patterns it thinks is better?
> This should maybe take into account the AddedComplexity somehow, but that's not
> really compatible with the isHigherPriorityThan scheme :/

This isn't really blocked by isHigherPriorityThan(). One of the higher level objects can always weigh complexity into its result.

> I'm not sure using AddedComplexity is a good idea in the first place: it's very magic
> number-y and totally SDAG implementation specific. Still, care has been put into that,
> so if there's any value we can easily extract from that, let's do it.

I'd prefer not to add AddedComplexity for the same reasons. If possible, I'd prefer a more intuitive metric but I'm not sure what that metric would be yet.

I did some digging today and quite a few of the cases where AddedComplexity affects the order (determined by checking if the less-than operator gave different results with and without the AddedComplexity adjustment during std::sort) would be unaffected by the removal of AddedComplexity if sorting based on the root operator replaced it. For example:
  (ld:f16 (am_indexed16:iPTR GPR64sp:i64:$Rn, uimm12s2:i64:$offset))<<P:Predicate_unindexedload>><<P:Predicate_load>>
  (fadd:f32 (vector_extract:f32 FPR128:v4f32:$Rn, 1:i64), (vector_extract:f32 FPR128:v4f32:$Rn, 0:i64))
   ## 23 (10 added) vs 19 (0 added)

Once you exclude those obvious cases, there's 58 checks of which 26 are of the form:
  (add:i32 addsub_shifted_imm32:i32:$imm, GPR32sp:i32:$Rn)
  (add:i64 (mul:i64 (zext:i64 GPR32:i32:$Rn), (imm:i64)<<P:Predicate_i64imm_32bit>>:$C), GPR64:i64:$Ra)
   ## 18 (6 added) vs 18(5 added) ##
and are separated by the tree structure. In this example, the addsub_shifted_imm32 one gets priority in the end.

The other 32 are of the form:
  (add:i32 arith_shifted_reg32:i32:$Rm, GPR32:i32:$Rn)
  (add:i64 neg_addsub_shifted_imm64:i64:$imm, GPR64:i64:$Rn)
   ## 12 (0 added) vs 13(1 added) ##
  (st GPR64:i64:$Rt, (am_indexed8:iPTR GPR64sp:i64:$Rn, uimm12s1:i64:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_truncstore>><<P:Predicate_truncstorei8>>
  (st FPR128:v16i8:$Rt, (am_indexed7s64:iPTR GPR64sp:i64:$Rn, simm7s8:i32:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_nontemporalstore>>
   ## 23 (10 added) vs 28(15 added) ##
and will have a direct effect on the output when we can import them all.

> Maybe instead sum up the "complexity" of each Matcher? We'd want to make sure to
> prioritize significant metrics like more instructions over operand kinds and such.
>
> Assuming you don't have another usage in mind for the *Kind enums, that might let
> you simplify that to virtual methods in each Matcher returning its cost.
>
> Our backend has scripts that takes into account ISA information and auto generates 
> various DAGPatterns and also calculates the added complexity for each pattern and emits 
> them (not in sorted order by complexity).

As far as I can tell, the AddedComplexity adjustment is a constant that's added in certain manually-chosen cases. We'll need to take this discussion off-list to be able to talk details.

> If we try to reprioritize patterns and come up with newer metrics/aggregates for
> matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing
> about costs).

I agree, the aim is to avoid re-prioritizing the patterns if at all possible. However, SelectionDAG's metrics are based on the SDNode and MVT representations that do not exist in GlobalISel. Purely GlobalISel rules will not be able integrate with this scheme. 

I think we should aim to use something that makes sense for GlobalISel so that we have one ordering that applies to all GlobalISel rules regardless of whether they were imported from SelectionDAG or not. My hope is that the order resulting from a GlobalISel-based scheme will be functionally the same, or at least very close to the SelectionDAG one.

https://reviews.llvm.org/D29711