[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Nicolai Hähnle via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 9 02:55:25 PST 2018


Hi Daniel,

Lots of good stuff in there! I especially like the design for specifying 
out-of-line predicates. I have a couple of small comments and one major 
one below.


On 09.11.18 02:42, Daniel Sanders via llvm-dev wrote:
> _Passing arbitrary data from match to apply_
> _
> _
> As mentioned earlier, the defs block defines the interface between the 
> match and apply steps. This can be used to arrange for arbitrary data to 
> be passed from match to apply.
> 
> In the current AArch64PreLegalizeCombiner we have a rule that matches a 
> G_LOAD and all its uses simultaneously and decides on the best way to 
> rewrite it to minimize the sign/zero/any-extend operations. This rule 
> passes a struct (PreferredTuple) between the current C++ equivalent for 
> the match to the current C++ equivalent to the apply. Converting that 
> into this tablegen syntax, we'd write:
>      def extending_load_matchdata : GIDefMatchData<"PreferredTuple">;
>      def extending_load_predicate : GIMatchPredicate<
>           (ins reg:$A, extending_load_matchdata:$B), bool, [{
>        return Helper.matchCombineExtendingLoads(${A}, ${matchinfo});

I assume this was intended to be ${B} instead of ${matchinfo}?

I also think you should have 'ins' and 'outs' separately; after all, a 
predicate may have to do a combined check on two matched registers / 
operands, and produce one or more values for later re-use.

Once you have separate 'ins' and 'outs', the "bool" in there seems a bit 
redundant.


>      }]>;
>      def extending_loads : GICombineRule<
>        (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
>        (match [{MIR %root = G_LOAD %A }],
>               (extending_load_predicate root:$A,
>                                        
>   extending_load_matchdata:$matchinfo)),
>        (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, 
> ${matchinfo}); }],
>                     reg:$root, extending_load_matchdata:$matchinfo)>;
> 
> The GIDefMatchData declares a new type of data that can be passed from 
> the match to the apply. Tablegen is responsible for arranging for 
> appropriate storage during the Combine algorithm. The GIMatchPredicate 
> declares a C++ predicate that fills out the PreferredTuple (passed by 
> reference) whenever it returns true for a successful match. We could 
> have made the predicate return std::pair<bool, PreferredTuple> instead 
> but that's less efficient (it would be an sret return on many targets) 
> and would require us to define the truthiness (no examples of are in 
> this email as I expect it to be a rare thing to need) in order to act as 
> a predicate. Normally, you'd feed this into a (create_imm ...) or a 
> (create_operand ...) in the apply section. However, in this particular 
> case the data being passed determines the entirety of the replacement so 
> we escape into arbitrary C++ instead and arrange for the variables to be 
> injected appropriately using the 'exec' operator.

Have you considered the possibility of defining custom `create` actions 
for the apply, analogous to custom predicates?

In your description, it appears there is a fixed list of builtin actions 
such as `create_imm`, `create_operand` (not really described anywhere?), 
`exec` and the tentative debug info approach.

It seems valuable to be able to define one's own operator for custom 
operations.


> _Macros_
> _
> _
> I simplified the previous example a bit. Rather than only matching a 
> G_LOAD, the current rule in AArch64 can match any of G_LOAD, G_SEXTLOAD, 
> and G_ZEXTLOAD. We need some means to match one of several alternatives 
> as well as collect and re-use common subpatterns. I've yet to look into 
> how this would be practically implemented and this section is a bit 
> vague as a result but here's the current thinking on how it should look 
> and behave:
>      def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
>                            (match (oneof [{MIR %R = G_LOAD %S}],
>                                          [{MIR %R = G_SEXTLOAD %S}],
>                                          [{MIR %R = G_ZEXTLOAD %S}]):$IDX>;
>      def extending_loads : GICombineRule<
>        (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo, 
> ANYLOAD:$ANYLOAD),
>        (match [{MIR %root = ANYLOAD %A }],
>               (extending_load_predicate root:$A,
>                                        
>   extending_load_matchdata:$matchinfo)),
>        (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, 
> ${matchinfo}); }],
>                     reg:$root, extending_load_matchdata:$matchinfo)>;
> Effectively, we're declaring a fake instruction and import it into this 
> rule, possibly renaming it using the argument name ($ANYLOAD in this 
> case) to provide a more convenient name in the MIR block or to 
> disambiguate multiple instances. Once we've parsed the MIR, we would 
> recursively replace any instance of ANYLOAD with code match one of the 
> alternatives. The variables in the 'defs' section of the macro would be 
> available as $ANYLOAD_R, $ANYLOAD_S, and $ANYLOAD_IDX (we have to use 
> '_' instead of a '.' to fit within tablegen's syntax) with $ANYLOAD_IDX 
> indicating which alternative the (oneof ...) matched. When nested by 
> including a macro in the macro's defs section and using it, the names to 
> access the sub-macros variables would grow longer by underscore 
> separated concatenation, for example '$ANYLOAD_ANYEXT_A'.

This magic generation of variable names seems really wrong to me. Let's 
just use a natural operands interface, like so (and I'm also doing 
something else here, more on that later):

   def ANYLOAD : GIMacro<
     (ops def:$R, use:$S, imm:$IDX),
     (defs), // optional extra defs that can be used in predicates
     (oneof (match (G_LOAD $R, $S), 0:$IDX),
            (match (G_SEXTLOAD $R, $S), 1:$IDX),
            (match (G_ZEXTLOAD $R, $S), 2:$IDX))>;
   def extending_loads : GICombineRule<
     (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
           imm:$idx),
     (match (ANYLOAD $root, $A, $idx),
            (extending_load_predicate $A, $matchinfo)),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

There'd be a natural and simply way to shift predicates and variables in 
and out of macros: the following would be equivalent:

   def ANYLOAD : GIMacro<
     (ops def:$R, extendling_load_matchdata:$matchinfo, imm:$IDX),
     (defs use:$S),
     (match (oneof (match (G_LOAD $R, $S), 0:$IDX),
                   (match (G_SEXTLOAD $R, $S), 1:$IDX),
                   (match (G_ZEXTLOAD $R, $S), 2:$IDX)),
            (extending_load_predicate $S, $matchinfo))>;
   def extending_loads : GICombineRule<
     (defs def:$root, extending_load_matchdata:$matchinfo, imm:$idx),
     (ANYLOAD $root, $A, $matchinfo, $idx),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

... and of course it could all be inlined:

   def extending_loads : GICombineRule<
     (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
           imm:$idx),
     (match (oneof (match (G_LOAD $root, $A), 0:$idx),
                   (match (G_SEXTLOAD $root, $A), 1:$idx),
                   (match (G_ZEXTLOAD $root, $A), 2:$idx)),
            (extending_load_predicate $A, $matchinfo)),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

In other words, macros would really just be macros or sub-routines.

Let's talk about using DAGs. I think I understand where the temptation 
comes from to describe the MIR using code blocks, but I'm also pretty 
sure that it's a mistake to do so.

One reason is that it adds yet another parser, which is more maintenance 
burden without buying much.

The more important reason is that DAGs compose better with the rest of 
TableGen. Consider combine rules defined in multiclasses, for example. 
That is very common all through LLVM. In order to mirror an existing 
selection rule in the AMDGPU backend, we'd likely want to have something 
like this:

   multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                    Instruction inst> {
     // This is probably wishful thinking -- would be okay if we had to
     // split this into two different rules due to the different types
     // of $dst
     def : GICombineRule<
       (defs root:$dst, operand:$src0, operand:$src1),
       (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
              (match (ginst i16:$tmp, i16:$src0, i16:$src1),
                     (G_ZEXT i32:$dst, $tmp))),
       (inst $dst, $src0, $src1)>;

     // A third rule for zext to 64 bits would go here...
   }

Obviously you can achieve this in your proposal as well using string 
concatenation, but that does seem rather backwards.

Something else that falls out rather nicely from using DAGs is that it 
gives you a natural way to give a name to an _instruction_, to be able 
to pass it to a predicate for extended checks (and as a potentially more 
natural way of preserving debug locations!).

I have a more explicit example of this later.


> _'Upside-down' matches (i.e. roots at the top) and similar_
> 
> This one requires algorithm changes which I'd prefer not to discuss in 
> this RFC. Assuming the underlying algorithm gains support for this, this 
> is how the syntax would look:
> def : GICombineRule<
>    (defs root:$root, reg:$A),
>    (match [{MIR %1 = G_LOAD %root
>                 %A = G_SEXT %1 }]),
>    (apply [{MIR %A = G_SEXTLOAD %root }])>;
> 
> The only unusual thing about this rule is that the root isn't at the 
> bottom. Instead of starting at a use and matching towards defs, we're 
> starting at the def and matching towards uses. This has some potentially 
> useful properties. The combine algorithm has to chose an insertion point 
> for the replacement code and the traditional choice has been the root of 
> the match. Assuming we keep doing the same thing, 'upside-down' matching 
> like this is able to avoid the checks that the load is safe to move, is 
> non-volatile, has one use, etc. that would be necessary if we moved the 
> G_LOAD down to the G_SEXT. Also, assuming we keep the same broadly 
> bottom-up processing order as the existing Combine algorithm this kind 
> of rule also has relatively lower priority than conventional rules 
> because the root is seen later. This can be useful as 
> (algorithm-dependent) the MIR may be more settled by the time it tries 
> to match.
> 
> Along the same lines, the syntax can potentially support the root being 
> in the middle of a match. This could be used in a similar way to the 
> upside-down match above to control the insertion point and priority. For 
> example:
> def : GICombineRule<
>    (defs root:$root, reg:$A, reg:$B, reg:$C),
>    (match [{MIR %1(s32) = G_TRUNC %A(s64)
>                 %2(s32) = G_TRUNC %B(s64)
>                 %root = G_ADD %1, %2
>                 %C(s64) = G_SEXT %root }]),
>    (emit [{MIR %root = G_ADD %A, %B
>                %C = G_SEXT_INREG %root }])>;
> Unfortunately, I don't have any concrete examples where this would be 
> useful in comparison to a conventional or upside-down match at the 
> moment. I'm mostly keeping the door open as I can see potential for 
> benefits (mostly w.r.t sinking and hoisting safety around an instr that 
> would be difficult to test for that) given an appropriate rule and also 
> because I'm inclined to say that the tablegen syntax shouldn't be the 
> reason it's not possible. It should be up to the Combine algorithm and 
> and tablegen-erate pass involved in specializing the algorithm for a target.

I'm concerned that the concrete syntax proposed here mixes a bunch of 
concerns that really should be addressed separately:

1. Defining a mapping between semantically equivalent chunks of code.
2. Controlling insertion points as a simple way to keep side effects 
under control in many cases.
3. Controlling additional algorithmic aspects of the combine algorithm 
(whether you're matching up or down, say).

Could we keep those separate somehow? E.g., don't distinguish between 
'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have 
TableGen figure out the roots automatically based on walking the graph 
implied by the matching rule, and default to using the sink(s) as roots.

Then add some optional let-able variable in the GICombineRule to define 
non-default rules for the combine algorithm.

Controlling the insertion point to address loads like in your example 
above could be done quite naturally if the MIR is expressed in a DAG:

   def : GICombineRule<
     (match (G_LOAD $tmp, $ptr):$load,
            (G_SEXT $dst, $tmp)),
     (apply (insertat $load),
            (G_SEXTLOAD $dst, $ptr))>;

As you can see, this discussion also makes me wonder whether the (defs) 
are needed at all; maybe we can rely on type inference almost everywhere?


> _Multiple roots_
> 
> This one requires algorithm changes which I'd prefer not to discuss in 
> this RFC. Assuming the underlying algorithm gains support for this, this 
> is how the syntax would look:
> def : GICombineRule<
>    (defs root:$root1, root:$root2, reg:$A, reg:$B),
>    (match [{MIR %root1 = G_ADD %A, %B }],
>           [{MIR %root2 = G_SUB %A, %B }]),
>    (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
> This would match a G_ADD and G_SUB with operands in common and combine 
> them into a BUTTERFLY operation. You can think of this as two normal 
> rules, one with %root1 as the root and the other with %root2 as the root.

Again, it seems to me that it would be preferable if we could just 
express this rule as:

   def : GICombineRule<
     (match (G_ADD $dst1, $a, $b),
            (G_SUB $dst2, $a, $b),
            reg:$a, reg:$b),
     (BUTTERFLY $dst1, $dst2, $a, $b)>;

or perhaps:

   def : GICombineRule<
     (match (G_ADD $dst1, reg:$a, $b),
            (G_SUB $dst2, $a, reg:$b)),
     (BUTTERFLY $dst1, $dst2, $a, $b)>;

and similar variations.

TableGen should be able to figure out the multiple roots by walking the 
dependency graph that is implied by the knowledge about which operands 
of G_ADD and G_SUB are defs and uses, respectively.

To sum up, I really like the proposal overall, except I think it would 
be a **major** improvement to use DAGs for expressing the MIR, and sure, 
there's the odd detail here and there like about GIMacro.

Cheers,
Nicolai


> _Grouping and Ordering Rules_
> 
> Combine rules are ordered and grouped using definitions of the form:
>      def trivial_combines : GICombineGroup<[copy_prop]>;
>      def combines_for_extload: GICombineGroup<[extending_loads]>;
>      def all_combines : GICombineGroup<[trivial_combines, 
> combines_for_extload]>;
> Essentially, we create a long ordered list of all the combines. Tablegen 
> is free to re-order rules so long as the resulting ruleset always 
> behaves as if it were in this specific order. So for example, the list 
> [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to 
> re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts] 
> because the rules involved are mutually exclusive due to the root opcode.
> 
> So why not make tablegen figure out the order like ISel does? ISel 
> attempts to figure out an order using an scoring system called 
> Complexity along with a hack (AddedComplexity) to allow the user to 
> provide magic numbers to fix the cases it got it wrong. The simplest way 
> to confuse it is with patterns like (add s32, complexpattern1) and (add 
> s32, complexpattern2). These have the same Complexity score but in 
> truth, each one has a (possibly overlapping) range of complexity 
> depending on what C++ code is inside the complexpattern's and which path 
> through that C++ is dynamically taken. If it matches nothing then the 
> score should be 0 but if it matches a dozen nodes it should be 12 (or 
> possibly higher). We don't know which until we try to match that 
> specific pattern. Correctly figuring out an order in the presence of 
> complexpatterns is impossible. Similarly, it's also possible to confuse 
> it with patterns that differ but overlap and add up to the same 
> complexity due to quirks of the scoring system.
> 
> _Declaring Combine Passes_
> 
> Combiner passes are defined by subclassing GICombiner like so:
>      def CommonPreLegalizerCombiner: GICombiner<
>        "CommonGenPreLegalizerCombiner", [all_combines]> {
>        let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
>        let Modifiers = [(disable_rule copy_prop)]
>      }
> 
> This causes tablegen to create a class named 
> 'CommonPreLegalizerCombiner' which you can use to perform combines. This 
> combiner contains all the combines mentioned in the previous section 
> because it includes the 'all_combines' group. However, it disables the 
> copy_prop rule to prevent it from attempting to match. I'll discuss that 
> a bit more below. It also generates a command-line option for asserts 
> builds only which can be used to further disable rules at run-time which 
> will be useful for tracking down bugs or for testing a rule in 
> isolation. I'm hoping that one day tools like bugpoint will be able to 
> search through the individual rules within a combine pass when searching 
> for a minimal reproducer.
> 
> The Modifiers field is intended to allow targets to modify an existing 
> ruleset (particularly a target independent one) with additional target 
> specific quirks. For example, one particular rule might be doing more 
> harm than good and should be disabled. Or maybe only a subset of the 
> things it would normally match are safe in which case an extra predicate 
> should be tested. Being able to make minor edits to the ruleset without 
> taking on the whole maintenance burden of the common code or causing 
> code bloat by duplicating tables would be useful for targets that are 
> generally similar to the targets within LLVM but have minor quirks.
> 
> Larger scale changes should take an alternate approach to modifiers. 
> It's expected that even targets that are very different from the rest of 
> the pack still have features in common. Such targets can declare their 
> own combiner to replace the common one but still generally make use of 
> the improvements made by the wider community. This is where 
> GICombinerGroup will start to shine as such a target can declare a 
> combiner like so:
>      def MyTargetPreLegalizerCombiner: GICombiner<
>        "MyTargetGenPreLegalizerCombiner",
>        [common_extend_optimizations,
>         common_extending_loads,
>         // common_rsqrt_and_nr_iterations, // This target has a real 
> sqrt operation
>         mytarget_special_fma,
>         common_fma,
>         common_bswap_idioms,
>         mytarget_bcd_arithmetic
>        ]> {
>      }
> As LLVM improves on the common_* groups, the target benefits from those 
> improvements automatically. However, it doesn't benefit from new groups 
> being added to common_all_combines because it's no longer using that 
> group so entirely new categories of combines added to it would not 
> appear in the combiner. It would still be nice to find out that a new 
> category has appeared though so that a decision can be made on it. To 
> that end, I'm considering adding:
>      let Verify = [(has_all_of_except all_combines, 
> common_rsqrt_and_nr_iterations)];
> which would cause a warning if something new appeared in the original group.
> 
> _Conclusion_
> 
> There is a lot of work that needs to be done to get all this working and 
> some of it may have to change once it runs into the reality of 
> implementation :-). However, we think that this will prove to be a very 
> convenient and powerful syntax with some potential a variety of tools 
> from profilers, to bug reducers, to correctness checking tools.
> 
> There is a patch at https://reviews.llvm.org/D54286 makes a start on it 
> to try out the general feel of the syntax but currently lacks the core 
> feature of generating match and apply code from the MIR. I'm going to be 
> looking into that over the next few weeks.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the llvm-dev mailing list