[PATCH] D39034: [WIP][GlobalISel][TableGen] Optimize MatchTable for faster instruction selection

Tue Oct 24 12:34:48 PDT 2017

dsanders added a comment.

There are a lot of changes in this patch so I haven't gone through the detail but this seems to be heading in the general direction I had in mind for when we got to optimizing the rules.

After importing and sorting the ruleset into priority order. I was thinking of having a collection of partitioners which would subdivide the ruleset into groups without changing that priority order. So for example, assuming the priority was this (it wouldn't be but ignore that for now):

  (G_FNEG (G_FADD x, y))
  (G_FADD x, y)
  (G_FSUB x, y)
  (G_FNEG x)

one partitioner would partition based on InstructionMatcher::Operands::size() like so:

  (G_FNEG (G_FADD x, y))
  ---
  (G_FADD x, y)
  (G_FSUB x, y)
  ---
  (G_FNEG x)

Even though the first and last rules only have one operand at the top level, they aren't grouped because of the priority of the G_FADD/G_FSUB in between. This way, we can ensure priority is being respected. The partitioner would then remove the relevant check from the rules and move them to the partition boundaries with appropriate GIM_Try's, much like you're doing in this patch. Each sub-ruleset would then be further partitioned forming a decision tree. Finally, when we run out of partitioners that produce >=2 partitions we generate the remaining rules in the ruleset as we do now.

Choosing which partitioner to apply at any given time is somewhat tricky. For this, I was thinking there would be two kinds of partitioner. Those that deal with structure (number of operands, opcodes?*, nested instructions, etc.) and those that deal with predicates. The structural ones would apply first, leaving the resulting sub-rulesets to test predicates using common instruction-id's and operand indices. For example:

  (G_FADD x, simm16)
  (G_FADD x, y)
  (G_FNEG (G_FADD x, y))
  (G_FNEG x)

would be divided by the structural partitioners like so

  GIM_Try, ...
  	GIM_CheckNumOperands, Insn0, 3,
  	GIM_Try, ...
  		GIM_CheckOpcode, Insn0, G_FADD,
  		# (G_FADD x, simm16)
  		# (G_FADD x, y)
  GIM_Try, ...
  	GIM_CheckNumOperands, Insn0, 2,
  	GIM_RecordInsn, Insn1, Insn0, 1,
  	GIM_Try, ...
  		GIM_CheckOpcode, Insn0, G_FNEG,
  		GIM_CheckOpcode, Insn1, G_FADD,
  		# (G_FNEG (G_FADD x, y))
  	GIM_Try, ...
  		GIM_CheckNumOperands, Insn0, 2,
  		# (G_FNEG x)

The G_FADD cases:

  (G_FADD x, simm16)
  (G_FADD x, y)

would then be further sub-divided by the predicate-based partitioners like so:

  GIM_Try, ...
  	GIM_CheckNumOperands, Insn0, 3,
  	GIM_Try, ...
  		GIM_CheckOpcode, Insn0, G_FADD,
  		GIM_Try, ...
  			GIM_CheckPredicate, Insn0, 2, simm16
  			# (G_FADD x, simm16)
  		GIM_Try, ...
  			# (G_FADD x, y)
  GIM_Try, ...
  	GIM_CheckNumOperands, Insn0, 2,
  	GIM_RecordInsn, Insn1, Insn0, 1,  // MIs[1] = getVRegDef(MIs[0].getOperand(1))
  	GIM_Try, ...
  		GIM_CheckOpcode, Insn0, G_FNEG,
  		GIM_CheckOpcode, Insn1, G_FADD,
  		# (G_FNEG (G_FADD x, y))
  	GIM_Try, ...
  		GIM_CheckNumOperands, Insn0, 2,
  		# (G_FNEG x)

Within each set of partitioners, we'd need some heuristics to select an optimal one (and prevent the tree builder from getting too expensive). This is another reason for separating structural partitioners from predicate-based partitioners. The structural set have a more-or-less fixed order they can be applied (because they directly walk the tree) while still being near-optimal, then by the time we're onto the predicate-based ones (which have more freedom) we should have a small enough set to be a bit more thorough and evaluate which one produces the better tree. Somewhere in here, I also wanted to factor in the relative frequency of instructions so that rare instructions are checked last and frequent instructions require fewer checks. I was thinking this could be done as a tie-breaker in isHigherPriorityThan().

One thing the structual partitioners will want in order to maximize sharing of the structure predicates is a predictable numbering of instructions. A simple numbering scheme wouldn't work for variadic instructions so I think it's probably best to build the tree first and number the instructions afterwards. This is the reason instruction ID's were assigned during the emit*() functions and not stored in the matcher objects themselves.

*It's a bit debatable whether the opcode is structural or not. Technically you don't need it for structure but it seems wasteful to walk the def-use chain if the opcode is already going to prevent the match. On the other hand, if it's structural then (G_ADD (G_MUL x, y), z) and (G_SUB (G_MUL x, y), z) have different structure. It may be worth having a GIM_CheckOpcodeIsInSet or similar to balance the two extremes.

================
Comment at: utils/TableGen/GlobalISelEmitter.cpp:594-597
+  void clearImplicitMap() {
+    NextInsnVarID = 0;
+    InsnVariableIDs.clear();
+  };
----------------
I don't understand why this is needed.

================
Comment at: utils/TableGen/GlobalISelEmitter.cpp:3053-3062
+  std::stable_sort(Rules.begin(), Rules.end(), [&](const RuleMatcher &A,
+                                                   const RuleMatcher &B) {
+    if (A.isHigherPriorityThan(B)) {
+      assert(!B.isHigherPriorityThan(A) && "Cannot be more important "
+                                           "and less important at "
+                                           "the same time");
+      return true;
----------------
This bit is correctness rather than optimization

Repository:
  rL LLVM

https://reviews.llvm.org/D39034