[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Sat Nov 10 03:28:10 PST 2018
Thank you for the detailed reply! There's a lot to digest :) Let me try
to address most of it.
[snip]
>> I also think you should have 'ins' and 'outs' separately; after all, a
>> predicate may have to do a combined check on two matched registers /
>> operands, and produce one or more values for later re-use.
>>
>> Once you have separate 'ins' and 'outs', the "bool" in there seems a
>> bit redundant.
>
> I think there's three kinds values involved in the predicates. The first
> is the inputs like these values come from other parts of the match and
> it makes sense that they belong in 'ins'. The second is values that are
> directly related to the truthiness of the predicate such as bool,
> unsigned (register number), MachineInstr*, etc.. These are the result of
> the underlying function and there can only be one of these per
> predicate. The type can potentially be std::pair or std::tuple to give
> multiple results provided a suitable conversion to bool is provided. The
> third is metadata that's just carrying additional information to the
> apply. These are currently in the 'ins' section and I agree that they
> don't really make sense there. I can pull them out into a separate
> 'outs' section. This has a flexibility cost since the user can no longer
> specify the complete argument order but that's probably a good thing for
> readability.
>
> Here's an example of a non-bool result:
> def reg_with_shift : GIMatchPredicate<
> (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
> // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
> uint64_t);
> return matchARMShiftedOp2(${A}, ${imm});
> }]>;
> def : GICombineRule<
> (defs root:$D, reg:$S1, reg_with_shift:$S2),
> (match [{MIR %D = G_ADD %S1, %S2 }]),
> (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
> (I've just realized there's a gap in the ability to name metadata when
> used in this way. I've gone with 'S2.shift' for now)
>
> This is equivalent to:
> def reg_with_shift : GIMatchPredicate<
> (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
> // bool matchARMShiftedOp2(const MachineOperand &,
> MachineOperand*&, uint64_t&);
> return matchARMShiftedOp2(${A}, ${reg}, ${imm});
> }]>;
> def : GICombineRule<
> (defs root:$D, reg:$S1, reg_with_shift:$S1),
> (match [{MIR %D = G_ADD %S1, %S2.reg }]),
> (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
> but the first definition slightly more efficient for most targets since
> there's usually a register or two for return values which in this
> version we're spending solely on a redundant bool and requires fewer
> argument registers, the second version has to store the MachineOperand*
> on the stack to pass it by reference. The difference is fairly small
> when considered on an individual rule but should accumulate on large
> rulesets or large functions.
Okay, I see now where you're coming from, and the goal makes a lot of
sense to me. I need some help with the syntax and semantics though :)
First, just to clarify, the intention is for matchARMShiftedOp2 to match
something like "G_SHL %X, ${imm}", right? Why not express that directly
in MIR?
But for the sake of the argument I'll assume there's a good reason for
the escape to C++. The way I understood your original description of
GIMatchPredicate, its most generic and versatile form of use is as an
explicit entry in the (match) list. Like so:
def reg_with_shift : GIMatchPredicate<
(ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
// MachineOperand *matchARMShiftedOp2(const MachineOperand &,
// uint64_t &);
return matchARMShiftedOp2(${A}, ${imm});
}]>;
def : GICombineRule<
(defs root:$D, reg:$S1, reg:$S2),
(match [{MIR %D = G_ADD %S1, %tmp }],
(reg_with_shift $tmp, $shift):$S2),
(apply [{MIR %D = ADD %S1, %S2, %shift }])>;
I believe that the kind of syntax you showed, in whatever form it
eventually ends up, should just be syntactic sugar. It would be
de-sugared to essentially what I've shown here, right?
So then the question becomes: what is the sugared syntax, if any?
I must say that I really don't like your first example, i.e. this one:
def : GICombineRule<
(defs root:$D, reg:$S1, reg_with_shift:$S2),
(match [{MIR %D = G_ADD %S1, %S2 }]),
(apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
The syntax very strongly suggests that $S2 will be assigned to the RHS
operand of the G_ADD, but there seems to be non-local magic which causes
it to be assigned something else instead -- unless I've misunderstood
the purpose of the matchARMShiftedOp2?
>>> }]>;
>>> def extending_loads : GICombineRule<
>>> (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
>>> (match [{MIR %root = G_LOAD %A }],
>>> (extending_load_predicate root:$A,
>>> extending_load_matchdata:$matchinfo)),
>>> (apply (exec [{ Helper.applyCombineExtendingLoads(${root},
>>> ${matchinfo}); }],
>>> reg:$root, extending_load_matchdata:$matchinfo)>;
>>> The GIDefMatchData declares a new type of data that can be passed
>>> from the match to the apply. Tablegen is responsible for arranging
>>> for appropriate storage during the Combine algorithm.
>>> The GIMatchPredicate declares a C++ predicate that fills out the
>>> PreferredTuple (passed by reference) whenever it returns true for a
>>> successful match. We could have made the predicate return
>>> std::pair<bool, PreferredTuple> instead but that's less efficient (it
>>> would be an sret return on many targets) and would require us to
>>> define the truthiness (no examples of are in this email as I expect
>>> it to be a rare thing to need) in order to act as a predicate.
>>> Normally, you'd feed this into a (create_imm ...) or a
>>> (create_operand ...) in the apply section. However, in this
>>> particular case the data being passed determines the entirety of the
>>> replacement so we escape into arbitrary C++ instead and arrange for
>>> the variables to be injected appropriately using the 'exec' operator.
>>
>> Have you considered the possibility of defining custom `create`
>> actions for the apply, analogous to custom predicates?
>>
>> In your description, it appears there is a fixed list of builtin
>> actions such as `create_imm`, `create_operand` (not really described
>> anywhere?), `exec` and the tentative debug info approach.
>>
>> It seems valuable to be able to define one's own operator for custom
>> operations.
>
> I didn't go into the mechanical definitions but create_imm is really
> just a specialization of create_operand and is equivalent to
> (create_operand [{ MachineOperand::CreateImm( <code> ) }]). The string
> concatenation needed for that specialization currently doesn't work on
> the code type but that seems easily fixable. I haven't defined one yet,
> but I don't see any reason we couldn't define non-operands in a similar way.
Okay, that sounds good to me.
I'm fairly sure that concatenation on code types works, but it may be
the case that it currently requires some back-and-forth casts with the
string type. Anyway, like you say, should be easy to fix.
[snip]
>> Let's talk about using DAGs. I think I understand where the temptation
>> comes from to describe the MIR using code blocks, but I'm also pretty
>> sure that it's a mistake to do so.
>
> The main motivation to move away from DAGs came from limitations in DAGs
> and the general ugliness of the original DAG-based definitions. Once the
> idea of using MIR came up, we liked the fact that it matched the
> existing serialization format which makes it easy to turn a specific
> example from the compiler output into a combine rule and avoid the need
> for another representation for instructions. We also liked the way it
> dealt with the difficult cases, MachineMemoryOperands, subregs, etc. and
> also that it wasn't a new syntax or parser (although it make require
> modifications to the existing one). It also looked like it be convenient
> for a tool like Alive
> (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we didn't
> really explore that particular thought further.
>
> Ultimately, both representations are DAG's and it comes down to the
> convenience of the syntax in specifying the rules we want. MIR is very
> convenient because of it's familiarity and flexibility.
>
> One of the bigger problems with using tablegen's DAG type is that it
> doesn't deal with multi-result instructions very well. Every time this
> has been raised on the list w.r.t SelectionDAG the solution has boiled
> down to 'use C++ instead' and it would be good to fix that so that
> things like UADDO are representable. You can write a rule that matches
> something like divmod at the top-level using the 'set' operator:
> (set $D1, $D2, (divmod $A, $B))
> but as soon as it's not the top-level, it gets really ugly fast even
> using pseudo-nodes:
> (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
> (sext (result $T, 1)))
> In this example, outs is a pseudo-node needed to distinguish the results
> from the sinks of the DAG, and result is a pseudo-node that selects one
> of the multiple results for use by the parent node. Meanwhile the MIR
> for is:
> %0, %1 = G_DIVMOD %A, %B
> %D1 = G_SEXT %0
> %D2 = G_SEXT %1
> which seems much clearer. It becomes a bigger problem when you start
> wanting to match target specific instructions as multi-result becomes
> more common later in the backend, especially post-isel. It's also a
> problem for upside-down matches, inside-out, and multi-root matches
> since tablegen's dag type requires a single common root node and can
> only draw edges in one direction. If you want to have to root in the
> middle then you have to distort the DAG to bring it to the root and mark
> up the reversed edges somehow
>
> Moving the result inside the instruction operator helps with these
> problems but requires a list of dags and all the temporaries need to be
> named:
> [(G_DIVMOD $T1, $T2, $A, $B),
> (G_SEXT $D1, $T1),
> (G_SEXT $D2, $T2)]
> and at that point, we're very nearly at MIR.
Yes, that was kind of the point. Expressing MIR, but in a way that is
native to TableGen :)
I should probably clarify that when I wrote "DAG", I meant that only in
the sense of the TableGen 'dag' type, which should probably just be
called "S-expression" instead of "DAG".
I absolutely believe that a list-of-instructions approach is better than
what the SDISel currently uses, for all the reasons you mentioned
(multiple defs, multiple roots).
I simply think using the 'dag' type in TableGen is a much better
approach because it composes with other TableGen frontend features
(e.g., all the substitution rules "just work", you can programmatically
compose a dag/sexpr from multiple pieces using !con, etc.).
> Another issue we didn't like with DAG is that you also end up needing a
> lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural parts
> of the MIR syntax. For example:
> (set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
> compared to:
> %d:sub_lo = MYTGT_ADD %A, %B
> and:
> (set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD $C,
> $D), sub_hi))
> compared to:
> %d:sub_lo = MYTGT_ADD %A, %B
> %d:sub_hi = MYTGT_ADD %C, %D
How about:
(MYTGT_ADD (sub_lo $d), $A, $B),
(MYTGT_ADD (sub_hi $d), $C, $D),
That seems acceptable to me.
> Along the same lines, I also think that the integrated debug-info is
> only really practical in MIR. It's possible to shoe-horn DILocation in
> using a pseudo-node like so:
> (set $d, (add (mul $a, $b):$mul, $c):$add -> (set
> (merged_dilocation (muladd $a, $b, $c), $mul, $add)
> but there isn't really a good place to modify the DIExpression used by
> DEBUG_VALUE.
>
>> One reason is that it adds yet another parser, which is more
>> maintenance burden without buying much.
>
> The expectation is that we can make use of the existing MIR parser and
> replace it's MachineInstr/MachineOperand/etc. generation code with
> pattern-matching generation code. I'm going to be looking into the
> implementation of that over the next few weeks.
>
>> The more important reason is that DAGs compose better with the rest of
>> TableGen. Consider combine rules defined in multiclasses, for example.
>> That is very common all through LLVM. In order to mirror an existing
>> selection rule in the AMDGPU backend, we'd likely want to have
>> something like this:
>>
>> multiclass Arithmetic_i16_GIPats<Instruction ginst,
>> Instruction inst> {
>> // This is probably wishful thinking -- would be okay if we had to
>> // split this into two different rules due to the different types
>> // of $dst
>> def : GICombineRule<
>> (defs root:$dst, operand:$src0, operand:$src1),
>> (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
>> (match (ginst i16:$tmp, i16:$src0, i16:$src1),
>> (G_ZEXT i32:$dst, $tmp))),
>> (inst $dst, $src0, $src1)>;
>>
>> // A third rule for zext to 64 bits would go here...
>> }
>
> This is the reason that macros are explicitly imported in the defs
> section and the imported instances are renameable. The same thing in the
> MIR syntax (and using the changes to variable naming from above) would be:
>
> multiclass Arithmetic_i16_GIPats<Instruction ginst,
> Instruction inst> {
> def : GICombineRule<
> # There's a corner case for the double definition of $dst here.
> I'll come back to that later in the mixing concerns comment
> (defs root:$dst, (import-macro ginst:$ginst, def:$dst, use:$src0,
> use:$src1)),
> (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), # I've
> corrected the i16 to s16 here. i16 isn't a type in GlobalISel
> (apply /* I'll come back to this bit */)>;
>
> // Covers s32 and s64
> def : GICombineRule<
> (defs root:$dst, operand:$src0, operand:$src1, (import-macro
> ginst:$ginst)),
> (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
> %dst = G_ZEXT %0:(s16) }]
> (isS32OrS64 type:$dst)),
> (apply /* I'll come back to this bit */)>;
> }
> We could potentially fold the has G_ZEXT case and no G_ZEXT cases
> together with a macro but it seems unnecessary complexity in this case.
> It might be more worth it if there's a similar G_SEXT variant too.
>
> The reason for the "I'll come back to this bit" in the apply section is
> because you've raised an issue I'd forgotten to deal with. The opcodes
> in the apply section should also be parameterizable. My first thought on
> that is that there should be a macro-equivalent for apply such as:
> def : GIExpansion<
> (defs def:$R, use:$S, imm:$IDX),
> (apply (select imm:$IDX,
> (0 [{MIR %R = FOO %S }]),
> (1 [{MIR %R = BAR %S }]),
> (2 [{MIR %0 = BAZ %S
> %R = BAR %0 }])))>;
> or alternatively:
> def : GICombineRule<
> ...,
> ...,
> (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
> [{MIR %D = name %s }])>;
> The former is more powerful, but the latter is more convenient for
> trivial cases
I admit that you've lost me a little here :)
Okay, I see how importing an instruction as a macro is possible without
string concatenation, although I have to say that having to express this
explicit import still feels quite a bit more cumbersome than the
strawman syntax I've shown using the 'dag' type.
Again, the 'dag' type just composes better with the rest of TableGen.
On the apply-side though, I don't understand where the difficulty comes
from. Each instantiation of the multiclass has a single `inst' which
needs to go into the apply rule. That shouldn't be so difficult to
express -- at least with the 'dag' type it's trivial.
>> Obviously you can achieve this in your proposal as well using string
>> concatenation, but that does seem rather backwards.
>>
>> Something else that falls out rather nicely from using DAGs is that it
>> gives you a natural way to give a name to an _instruction_, to be able
>> to pass it to a predicate for extended checks (and as a potentially
>> more natural way of preserving debug locations!).
>
> I don't think we need a means to name instructions since instructions
> can be trivially found from of their results using
> MachineOperand::getParent().
>
> As noted above, merging debug locations doesn't fit very well into DAG's
> as you have to use a pseudo node to be able to provide both DILocations
> on the same result instruction, and there's no real means to include
> DEBUG_LOC for temporaries that are eliminated and need to be rebased
> from another value.
Hmm... I feel like we need more concrete examples for debug info to
discuss this. One of the reasons I've brought up naming instructions is
precisely because I thought it could simplify transferring DebugLocs.
Again, I'm not suggesting the use of a DAG like in SDSel, as I think
that's a mistake. I simply mean the 'dag' type of TableGen, which should
really be called 'sexpr' or something.
>> I have a more explicit example of this later.
>>
>>
>>> _'Upside-down' matches (i.e. roots at the top) and similar_
>>> This one requires algorithm changes which I'd prefer not to discuss
>>> in this RFC. Assuming the underlying algorithm gains support for
>>> this, this is how the syntax would look:
>>> def : GICombineRule<
>>> (defs root:$root, reg:$A),
>>> (match [{MIR %1 = G_LOAD %root
>>> %A = G_SEXT %1 }]),
>>> (apply [{MIR %A = G_SEXTLOAD %root }])>;
>>> The only unusual thing about this rule is that the root isn't at the
>>> bottom. Instead of starting at a use and matching towards defs, we're
>>> starting at the def and matching towards uses. This has some
>>> potentially useful properties. The combine algorithm has to chose an
>>> insertion point for the replacement code and the traditional choice
>>> has been the root of the match. Assuming we keep doing the same
>>> thing, 'upside-down' matching like this is able to avoid the checks
>>> that the load is safe to move, is non-volatile, has one use, etc.
>>> that would be necessary if we moved the G_LOAD down to the G_SEXT.
>>> Also, assuming we keep the same broadly bottom-up processing order as
>>> the existing Combine algorithm this kind of rule also has relatively
>>> lower priority than conventional rules because the root is seen
>>> later. This can be useful as (algorithm-dependent) the MIR may be
>>> more settled by the time it tries to match.
>>> Along the same lines, the syntax can potentially support the root
>>> being in the middle of a match. This could be used in a similar way
>>> to the upside-down match above to control the insertion point and
>>> priority. For example:
>>> def : GICombineRule<
>>> (defs root:$root, reg:$A, reg:$B, reg:$C),
>>> (match [{MIR %1(s32) = G_TRUNC %A(s64)
>>> %2(s32) = G_TRUNC %B(s64)
>>> %root = G_ADD %1, %2
>>> %C(s64) = G_SEXT %root }]),
>>> (emit [{MIR %root = G_ADD %A, %B
>>> %C = G_SEXT_INREG %root }])>;
>>> Unfortunately, I don't have any concrete examples where this would be
>>> useful in comparison to a conventional or upside-down match at the
>>> moment. I'm mostly keeping the door open as I can see potential for
>>> benefits (mostly w.r.t sinking and hoisting safety around an instr
>>> that would be difficult to test for that) given an appropriate rule
>>> and also because I'm inclined to say that the tablegen syntax
>>> shouldn't be the reason it's not possible. It should be up to the
>>> Combine algorithm and and tablegen-erate pass involved in
>>> specializing the algorithm for a target.
>>
>> I'm concerned that the concrete syntax proposed here mixes a bunch of
>> concerns that really should be addressed separately:
>>
>> 1. Defining a mapping between semantically equivalent chunks of code.
>> 2. Controlling insertion points as a simple way to keep side effects
>> under control in many cases.
>> 3. Controlling additional algorithmic aspects of the combine algorithm
>> (whether you're matching up or down, say).
>
> I think there may be a misunderstanding with #3. There's no real concept
> of matching up or down in the rules, there's only the starting point for
> the match and from there whether it's following the uses or defs isn't
> something the rule needs to be concerned about. The algorithm needs to
> be aware of it (and historically hasn't been, DAGCombine broadly runs
> bottom up and expects to match towards defs and can fail to re-schedule
> nodes for re-visit in many common cases) but the algorithm is expected
> to figure that out for itself.
>
> 'root' definitions are the only place that #2 and #3 come into the
> proposed syntax. At the moment, the only requirement the syntax imposes
> on the algorithm is that it has a concept of a current MachineOperand
> Def* (and by extension instruction) that will be used as the starting
> point for the match and that it can choose a valid insertion point for
> each instruction produced. The insertion point is the same position in
> the current algorithm but that's algorithm dependent and doesn't have to
> be the case in the long run. There may even be more than one for rules
> that emit multiple instructions in the apply section. There's currently
> no means to specify an insertion point, it's left up to the algorithm to
> find a place for each instruction. That said, I'd be surprised if there
> was an algorithm where the current MachineOperand isn't also at least
> one of the insertion points.
>
> * The current upstream code is a bit misleading about that and makes it
> seem that it's the MachineInstr but I made that mistake before on ISel
> so I'm intending to change that.
>
>> Could we keep those separate somehow? E.g., don't distinguish between
>> 'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have
>> TableGen figure out the roots automatically based on walking the graph
>> implied by the matching rule, and default to using the sink(s) as roots.
>
> operand isn't really connected to this, it's just a wildcard for an
> MachineOperand that doesn't care whether it's a reg, imm, MCExpr, or
> something else.
>
> I'm inclined to agree that it would be nice to get 'root' out of the
> definition but I don't really see a good way to do it. The sinks aren't
> always the right choice and are a particularly bad choice on combines
> like the extending loads since that's trying to match all uses
> simultaneously and those uses could have any opcode.
How about having it as a lettable variable? As in (using my favorite
syntax, but it should translate either way):
let MatchRoot = "$tmp" in
def : GICombineRule<
(match (G_LOAD $tmp, $ptr),
(G_SEXT $dst, $tmp)),
(apply (G_SEXTLOAD $dst, $ptr))>;
I'm using $tmp instead of $ptr as the root here since you wrote above
that it needs to be a Def.
Which makes the discussion about the insertion point potentially
confusing, since $tmp no longer appears in the (apply). But maybe that
doesn't matter.
If MatchRoot is not set, it'd just default to using a sink -- that
should be by far the most common case.
Also, it'd be nice to get rid of the (defs) part of the syntax entirely
for brevity. I believe that should be possible.
>> Then add some optional let-able variable in the GICombineRule to
>> define non-default rules for the combine algorithm.
>>
>> Controlling the insertion point to address loads like in your example
>> above could be done quite naturally if the MIR is expressed in a DAG:
>>
>> def : GICombineRule<
>> (match (G_LOAD $tmp, $ptr):$load,
>> (G_SEXT $dst, $tmp)),
>> (apply (insertat $load),
>> (G_SEXTLOAD $dst, $ptr))>;
>
> This isn't quite the same thing as the extending loads combine in the
> example. In that combine, it's matching the load and all uses of its
> result and chosing a replacement based on the global usage of that def
> across the whole function, rewriting conflicting sext/zext/aext/trunc's
> to be as cheap as possible (a lot of the detail of that is inside the
> function but the code is upstream if you want to see the details). This
> rule is matching a single use and relying on CSE to de-dupe the N
> G_SEXTLOAD's that come out of it. To illustrate the difference, consider:
> %0(s8) = G_LOAD %p
> %d1(s16) = G_SEXT %0
> %d2(s32) = G_SEXT %0
> If I apply the above rule as much as possible to this (twice) with the
> roots at the sink, I'll get:
> %d1(s16) = G_SEXTLOAD %p
> %d2(s32) = G_SEXTLOAD %p
> whereas if I apply the extending loads combine to it (once), I'll get:
> %d2(s32) = G_SEXTLOAD %p
> %d1(s16) = G_TRUNC %d2
> which will normally be cheaper to execute. In combination with other
> rules and CSE, the first version could eventually turn into the second
> version but it requires more memory churn and processing time to get there.
You've lost me again, sorry. Where does the G_TRUNC suddenly come from?
Is that the rule that uses `Helper.matchCombineExtendingLoads`?
In that case, don't worry, I only intended my example to be comparable
to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you
showed in your original mail as well.
If, on the other hand, you have a plan to get to the more advanced
result of matching multiple G_SEXTs and using G_TRUNC _without_ a C++
escape, I'd be _very_ interested to learn more about that!
> One of the things I'm planning to look into in the algorithm is making
> it smarter about combines involving multiple-uses in the intermediate
> operands. DAGCombine is harmfully greedy in this area and will often
> duplicate expensive operations if one use is combinable but another
> isn't. This has led to the addition of hasOneUse() checks which solves
> that but takes things to the opposite extreme. I think it should be
> possible to find an algorithm that makes an intelligent decision for
> defs with multiple uses.
Absolutely agreed that we need better solutions here.
In the AMDGPU backend, we have some rather interesting challenges around
those. This is perhaps a bit of a far future tangent, but one of them is
that floating point operations can optionally negate their operands;
however, this can be at the cost of a larger instruction encoding, and
the rules for that are rather complex. So there's a problem of
optimizing *where* we insert the negations to improve code size.
In a computation that is a tree with a single sink, an optimal solution
can be found in linear time using dynamic programming (maybe there's an
easier way, but I'm not aware of one). With general DAGs, the problem is
of course much harder.
The negation is just one example, and it would be pretty awesome if we
could use a declarative database of combines to make progress at the
very least in local search optimizations.
Thanks again for your detailed reply!
Cheers,
Nicolai
>> As you can see, this discussion also makes me wonder whether the
>> (defs) are needed at all; maybe we can rely on type inference almost
>> everywhere?
>>
>>
>>> _Multiple roots_
>>> This one requires algorithm changes which I'd prefer not to discuss
>>> in this RFC. Assuming the underlying algorithm gains support for
>>> this, this is how the syntax would look:
>>> def : GICombineRule<
>>> (defs root:$root1, root:$root2, reg:$A, reg:$B),
>>> (match [{MIR %root1 = G_ADD %A, %B }],
>>> [{MIR %root2 = G_SUB %A, %B }]),
>>> (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
>>> This would match a G_ADD and G_SUB with operands in common and
>>> combine them into a BUTTERFLY operation. You can think of this as two
>>> normal rules, one with %root1 as the root and the other with %root2
>>> as the root.
>>
>> Again, it seems to me that it would be preferable if we could just
>> express this rule as:
>>
>> def : GICombineRule<
>> (match (G_ADD $dst1, $a, $b),
>> (G_SUB $dst2, $a, $b),
>> reg:$a, reg:$b),
>> (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>
>> or perhaps:
>>
>> def : GICombineRule<
>> (match (G_ADD $dst1, reg:$a, $b),
>> (G_SUB $dst2, $a, reg:$b)),
>> (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>
>> and similar variations.
>
> Both of these would be ok when inferring the roots to be the sinks.
>
>> TableGen should be able to figure out the multiple roots by walking
>> the dependency graph that is implied by the knowledge about which
>> operands of G_ADD and G_SUB are defs and uses, respectively.
>>
>> To sum up, I really like the proposal overall, except I think it would
>> be a **major** improvement to use DAGs for expressing the MIR, and
>> sure, there's the odd detail here and there like about GIMacro.
>>
>> Cheers,
>> Nicolai
>
> Thanks for all your comments!
>
>>> _Grouping and Ordering Rules_
>>> Combine rules are ordered and grouped using definitions of the form:
>>> def trivial_combines : GICombineGroup<[copy_prop]>;
>>> def combines_for_extload: GICombineGroup<[extending_loads]>;
>>> def all_combines : GICombineGroup<[trivial_combines,
>>> combines_for_extload]>;
>>> Essentially, we create a long ordered list of all the combines.
>>> Tablegen is free to re-order rules so long as the resulting ruleset
>>> always behaves as if it were in this specific order. So for example,
>>> the list [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is
>>> free to re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext,
>>> zext_zexts] because the rules involved are mutually exclusive due to
>>> the root opcode.
>>> So why not make tablegen figure out the order like ISel does? ISel
>>> attempts to figure out an order using an scoring system called
>>> Complexity along with a hack (AddedComplexity) to allow the user to
>>> provide magic numbers to fix the cases it got it wrong. The simplest
>>> way to confuse it is with patterns like (add s32, complexpattern1)
>>> and (add s32, complexpattern2). These have the same Complexity score
>>> but in truth, each one has a (possibly overlapping) range of
>>> complexity depending on what C++ code is inside the complexpattern's
>>> and which path through that C++ is dynamically taken. If it matches
>>> nothing then the score should be 0 but if it matches a dozen nodes it
>>> should be 12 (or possibly higher). We don't know which until we try
>>> to match that specific pattern. Correctly figuring out an order in
>>> the presence of complexpatterns is impossible. Similarly, it's also
>>> possible to confuse it with patterns that differ but overlap and add
>>> up to the same complexity due to quirks of the scoring system.
>>> _Declaring Combine Passes_
>>> Combiner passes are defined by subclassing GICombiner like so:
>>> def CommonPreLegalizerCombiner: GICombiner<
>>> "CommonGenPreLegalizerCombiner", [all_combines]> {
>>> let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
>>> let Modifiers = [(disable_rule copy_prop)]
>>> }
>>> This causes tablegen to create a class named
>>> 'CommonPreLegalizerCombiner' which you can use to perform combines.
>>> This combiner contains all the combines mentioned in the previous
>>> section because it includes the 'all_combines' group. However, it
>>> disables the copy_prop rule to prevent it from attempting to match.
>>> I'll discuss that a bit more below. It also generates a command-line
>>> option for asserts builds only which can be used to further disable
>>> rules at run-time which will be useful for tracking down bugs or for
>>> testing a rule in isolation. I'm hoping that one day tools like
>>> bugpoint will be able to search through the individual rules within a
>>> combine pass when searching for a minimal reproducer.
>>> The Modifiers field is intended to allow targets to modify an
>>> existing ruleset (particularly a target independent one) with
>>> additional target specific quirks. For example, one particular rule
>>> might be doing more harm than good and should be disabled. Or maybe
>>> only a subset of the things it would normally match are safe in which
>>> case an extra predicate should be tested. Being able to make minor
>>> edits to the ruleset without taking on the whole maintenance burden
>>> of the common code or causing code bloat by duplicating tables would
>>> be useful for targets that are generally similar to the targets
>>> within LLVM but have minor quirks.
>>> Larger scale changes should take an alternate approach to modifiers.
>>> It's expected that even targets that are very different from the rest
>>> of the pack still have features in common. Such targets can declare
>>> their own combiner to replace the common one but still generally make
>>> use of the improvements made by the wider community. This is where
>>> GICombinerGroup will start to shine as such a target can declare a
>>> combiner like so:
>>> def MyTargetPreLegalizerCombiner: GICombiner<
>>> "MyTargetGenPreLegalizerCombiner",
>>> [common_extend_optimizations,
>>> common_extending_loads,
>>> // common_rsqrt_and_nr_iterations, // This target has a real
>>> sqrt operation
>>> mytarget_special_fma,
>>> common_fma,
>>> common_bswap_idioms,
>>> mytarget_bcd_arithmetic
>>> ]> {
>>> }
>>> As LLVM improves on the common_* groups, the target benefits from
>>> those improvements automatically. However, it doesn't benefit from
>>> new groups being added to common_all_combines because it's no longer
>>> using that group so entirely new categories of combines added to it
>>> would not appear in the combiner. It would still be nice to find out
>>> that a new category has appeared though so that a decision can be
>>> made on it. To that end, I'm considering adding:
>>> let Verify = [(has_all_of_except all_combines,
>>> common_rsqrt_and_nr_iterations)];
>>> which would cause a warning if something new appeared in the original
>>> group.
>>> _Conclusion_
>>> There is a lot of work that needs to be done to get all this working
>>> and some of it may have to change once it runs into the reality of
>>> implementation :-). However, we think that this will prove to be a
>>> very convenient and powerful syntax with some potential a variety of
>>> tools from profilers, to bug reducers, to correctness checking tools.
>>> There is a patch at https://reviews.llvm.org/D54286 makes a start on
>>> it to try out the general feel of the syntax but currently lacks the
>>> core feature of generating match and apply code from the MIR. I'm
>>> going to be looking into that over the next few weeks.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
>> Lerne, wie die Welt wirklich ist,
>> Aber vergiss niemals, wie sie sein sollte.
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the llvm-dev
mailing list