[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Daniel Sanders via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 30 09:45:47 PST 2018



> On Nov 30, 2018, at 06:28, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> 
> On 30.11.18 01:25, Daniel Sanders wrote:
>>> On Nov 29, 2018, at 02:02, Nicolai Hähnle <nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>> wrote:
>>> 
>>> On 27.11.18 19:01, Daniel Sanders wrote:
>>>> ...Continued from the other email
>>>> _Removing the defs section_
>>>> We can potentially infer quite a lot of the defs section but it requires both a complicated ruleset and that tblgen spends processing time doing the inferencing. That processing time is potentially significant for large combiners and for that reason we need to be careful not to let inferencing become a burden on tblgen. My main worry here is that it may require analyzing both the match and apply section to fully inference the types and I'd like the tablegen pass to be able to ignore the apply section until it's about to write it out to the output file.
>>> 
>>> Thank you for looking into this!
>>> 
>>> 
>>>> There are however, some simple rules we can use to minimize the defs section with just a scan of the defs section (trivial) and match section (which we have to do anyway):
>>>> 1. Match roots are implicitly declared in the defs section and have
>>>>    type 'reg'.
>>>> 2. Names used as argument to an operator that subclasses Instruction
>>>>    are implicitly declared in the defs section with type 'reg'. This
>>>>    allows leaf-operands to be used in the apply section
>>>>      * This does however come at additional run-time memory cost
>>> 
>>> Can you explain this? You mean run-time as in when the compiler runs, right? I guess I'm not familiar enough with the details of the matching algorithm.
>> That's right. The impact of this rule is to include all the operands covered by the match just in case they need to be delivered to apply. This makes the struct above larger than it really needs to be. For example:
>> (defs $d, $a, $b)
>>         (match (MUL $t, $a, $b),
>>                (ADD $d, $c, $t))
>> would record and pass three operands, but with this rule it would effectively be:
>> (defs $d, $t, $a, $b)
>> which is slightly larger.
>> The run-time memory cost comes from the implicit declaration in the defs section. The defs sections primary purpose is to describe the information that needs to be delivered from the match step to the apply step. So it can be seen as defining*:
>> struct {
>> MachineOperand &d;
>> MachineOperand &t; // due to the implicit declaration.
>> MachineOperand &a;
>> MachineOperand &b;
>> } Foo;
>> Foo matchRule(...);
>> void applyRule(Foo &, ...);
> 
> Okay, this is interesting.
> 
> Can't TableGen whittle down the members of that struct automatically to the intersection of operands that are referenced by both match and apply?
> 
> Cheers,
> Nicolai

Yes, but it requires parsing the apply section before we can generate the matcher and it would be the only reason to parse it that early so I'd like to avoid it if possible. Parsing it early will require us to hold the apply sections for the whole ruleset in memory (or parse it twice) while llvm-tblgen is generating the matcher even though the apply section has no real relevance to the state machine generation.

>> *That's not quite what the tablegenerated code will do in practice but it's the gist of what it will do. The real implementation will be managing a buffer that's shared between the whole ruleset and will tell the apply step which bits it should read.
>>> Cheers,
>>> Nicolai
>>> 
>>> 
>>>> 3. Names used as argument to an operator that subclasses Instruction
>>>>    are assumed to be 'reg'. This allows us to omit the
>>>>    formerly-anonymous names such as $t[0-3] in the above examples.
>>>>      * Using an 'operand' typed variable as the def of an instruction
>>>>        does not change it to 'reg' but does emit the isReg() check that
>>>>        'reg' would normally emit.
>>>> 4. Names used in an 'outs' position on a GIMatchPredicate are
>>>>    implicitly declared in the defs section using the type given by the
>>>>    predicate.
>>>> 5. Any names not given in the defs section are assumed to be 'instr' if
>>>>    they are used as a name for an operator that subclasses Instruction.
>>>>    This allows us to omit the defs for MI's involved in debug info.
>>>> 6. Any names used solely in the apply section and defined by a
>>>>    value-constructing operator (e.g. create*) can be omitted
>>>> With these rules, the defs section isn't completely gone but is often empty and can be omitted with a variant of GICombineRule. The examples would be as below. I've indicated which rule eliminates which def with color coding and strike-through:
>>>> // Omitted due to rule 1
>>>> // Omitted due to rule 2
>>>> // We haven't been listing any defs related to rule 3 since these are the formerly anonymous
>>>> // operands that now need names to fit into the tblgen dag type
>>>> // Omitted due to rule 4
>>>> // Omitted due to rule 5
>>>> //Omitted due to rule 6
>>>> def : GICombineRule<(defs reg:$D, reg:$S),
>>>> (match (G_ZEXT s32:$t1, s8:$S),
>>>> (G_TRUNC s16:$D, s32:$t1)),
>>>> (apply (G_ZEXT s16:$D, s8:$S))>;
>>>> def : GICombineRule<(defs reg:$D, reg:$S),
>>>> (match (G_ZEXT $t1, $S),
>>>> (G_TRUNC $D, $t1),
>>>> (isScalarType type:$D),
>>>> (isLargerType type:$D, type:$S)),
>>>> (apply (G_ZEXT $D, $S))>;
>>>> def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1),
>>>> (match (G_ZEXT $t0, $S):$MI0,
>>>> (G_TRUNC $D, $t0):$MI1),
>>>> (isScalarType type:$D),
>>>> (isLargerType type:$D, type:$S)),
>>>> (apply (G_ZEXT $D, $S, (debug_locations $MI0, $MI1)))>;
>>>> def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1, instr:$MI2, instr:$MI3, debug_expr:$DNewExpr),
>>>>                         (match (G_ZEXT $t0, $S):$MI0,
>>>>                                (G_TRUNC $D, $t0):$MI1,
>>>>                                (DBG_VALUE $t0):$MI2,
>>>>                                (DBG_VALUE $D):$MI3,
>>>>                                (isScalarType type:$D),
>>>>                                (isLargerType type:$D, type:$S)),
>>>>                         (apply (createDIExprLLVMFragment debug_expr:$MI2, type:$D):$DNewExpr,
>>>>                                (G_ZEXT $D, $S, (debug_location $MI0, $MI1)),
>>>>                                (DBG_VALUE $D, debug_local:$MI2, debug_expr:$DNewExpr, (debug_location $MI2)),
>>>>                                (DBG_VALUE $D, debug_local:$MI3, debug_expr:$MI3, (debug_location $MI3))))>;
>>>> // $VAL is needed to indicate it's an immediate and provide the predicate
>>>> def : GICombineRule<(defs reg:$D, reg:$A, imm_2:$VAL),
>>>> (match (G_MUL $D, $A, $VAL)),
>>>> (apply (MYTGT_DOUBLE $D, $A))>;
>>>> // $B/$C are needed to indicate they're immediates and provide $C's predicate
>>>> def : GICombineRule<
>>>> (defs reg:$D, reg:$A, imm:$B, imm_2:$C),
>>>> (match (G_ADD $t1, $A, $B),
>>>> (G_MUL $D, $t1, $C)),
>>>> (apply (create_imm [{ 2 * ${B}->getZExtValue() }], apint_value:$B):$NB,
>>>> (G_ADD $t1, $A, $A),
>>>> (G_ADD $D, $t1, $NB))>;
>>>> // $D is needed because we wanted operand instead of reg. We could rewrite the predicate to take a reg though.
>>>> def extending_loads : GICombineRule<
>>>> (defs operand:$D, reg:$A, extending_load_matchdata:$matchinfo),
>>>> (match (G_LOAD $D, $A),
>>>> (extending_load_predicate operand:$A,
>>>> extending_load_matchdata:$matchinfo)),
>>>> (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B}); }],
>>>> reg:$D, extending_load_matchdata:$matchinfo)>;
>>>> // I haven't included any rules for omitting defs from GIMacro but I can look into this if we want.
>>>> def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
>>>> (match (oneof (match (G_LOAD $R, $S)),
>>>> (match (G_SEXTLOAD $R, $S)),
>>>> (match (G_ZEXTLOAD $R, $S))):$IDX>;
>>>> // $ANYLOAD is needed to import the macro and name its instance
>>>> def extending_loads : GICombineRule<
>>>>       (defs reg:$D, reg:$A, extending_load_matchdata:$matchinfo, ANYLOAD:$ANYLOAD),
>>>>       (match (ANYLOAD $D, $A),
>>>>              (extending_load_predicate operand:$A,
>>>>                                         extending_load_matchdata:$matchinfo)),
>>>>       (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B}); }],
>>>>                    reg:$D, extending_load_matchdata:$matchinfo)>;
>>>> def : GICombineRule<
>>>>   (defs reg:$D, reg:$A),
>>>>   (match (G_LOAD $t1, $D),
>>>>          (G_SEXT $A, $t1)),
>>>>   (apply (G_SEXTLOAD $A, $D))> {
>>>>   let MatchStartsFrom = (roots $D);
>>>> };
>>>> def : GICombineRule<
>>>>   (defs reg:$D, reg:$A, reg:$B, reg:$C),
>>>>   (match (G_TRUNC s32:$t1, s64:$A),
>>>>          (G_TRUNC s32:$t2, s64:$B),
>>>>          (G_ADD $D, $t1, $t2)
>>>>          (G_SEXT s64:$C, $D)),
>>>>   (apply (G_ADD $D, $A, $B),
>>>>          (G_SEXT_INREG $C, $D))> {
>>>> let MatchStartsFrom = (roots $D);
>>>> };
>>>> def : GICombineRule<
>>>> (defs reg:$D1, reg:$D2, reg:$A, reg:$B),
>>>> (match (G_ADD $D1, $A, $B),
>>>> (G_SUB $D2, $A, $B)),
>>>> (apply (BUTTERFLY $D1, $D2, $A, $B))>;
>>>> def : GICombineRule<
>>>> (defs reg:$D, reg:$A, reg:$B),
>>>> (match (G_TRUNC s32:$t1, s64:$A),
>>>> (G_TRUNC s32:$t2, s64:$B),
>>>> (G_ADD $D, $t1, $t2),
>>>> (apply (ADD32 $D, (sub_lo $A), (sub_lo $B)))>;
>>>> // This one keeps the def of $D/$A because we wanted operand instead of reg. We didn't have
>>>> // anything requiring that though and could trivially change them to reg.
>>>> def : GICombineRule<
>>>> (defs operand:$D, operand:$A),
>>>> (match (G_LOAD $D, $A):$MI,
>>>> (mmo_is_load8 instr:$MI)),
>>>> (apply ...)>;
>>> 
>>> --
>>> Lerne, wie die Welt wirklich ist,
>>> Aber vergiss niemals, wie sie sein sollte.
> 
> -- 
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.



More information about the llvm-dev mailing list