[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Fri Nov 9 18:05:34 PST 2018

Thanks David!

> On Nov 9, 2018, at 08:36, David Greene <dag at cray.com> wrote:
> 
> Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes:
> 
>> I've been working on the GlobalISel combiner recently and I'd like to
>> share the plan for how Combine Rules will be defined in GlobalISel and
>> solicit feedback on it.
> 
> This is really great stuff!  I agree with pretty much everything Nicolai
> said, particularly the use of DAGs.  That's seems much more natually
> TableGen-y to me.  Specific comments are below.
> 
> But before that, I've been long pained that we have so much duplicated
> code in instcombine and dagcombine.  I know this is way beyond the scope
> of this work but do you think the basic concepts could be applied to
> produce TableGen-generated instcombine passes?  It would be nice to
> re-use many of the rules, for example, except they'd match LLVM IR
> rather than MIR.  As you go about implementation, maybe keep this idea
> in mind?

That's an interesting idea. Certainly tablegenerating InstCombine ought to be possible and sharing code sounds like it ought to be doable. MIR and IR are pretty similar especially after IRTranslator (which is a direct translation) through to the Legalizer (which is the first point target instructions can until targets make custom passes). From the Legalizer to ISel, there's still likely to be a fair amount of overlap between the two as a lot of the G_* opcodes directly correspond to LLVM-IR instructions. The tricky bit will be the escape hatches into C++ would need to either have Instruction/MachineInstr versions or would need to accept both.

>> Here's a simple example that eliminates a redundant G_TRUNC.
>> 
>> def : GICombineRule<(defs root:$D, operand:$S),
>> (match [{MIR %1(s32) = G_ZEXT %S(s8)
>> %D(s16) = G_TRUNC %1(s32) }]),
>> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
> 
> The use of '%' vs. '$' here really threw me for a loop.  I would have
> expected '$D' and '$S' everywhere.  What's the significance of '%'
> vs. '$'?  I know MIR uses '%' for names but must that be the case in
> these match blocks?

In MIR, '%' and '$' have a semantic difference when used on operands.
'%foo' is a virtual register named foo but '$foo' is the physical register foo.

The main reason I didn't pick something distinct from either (e.g. '${foo}') is that I'd like to minimize the need to modify the MIR parser to support pattern-specific syntax.

> Of course if we go with Nicolai's use of DAGs this issue seems to go
> away.
> 
>> * defs declares the interface for the rule. The definitions in the
>>  defs section are the glue between the Combine algorithm and the
>>  rule, as well as between the match and apply sections. In this case
>>  we only define the root of the match and that we have a register
>>  variable called $S.
> 
> My understanding is that "root" means the sink here, but later in the
> "upside-down" example, "root" means the source.  It seems to me the use
> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
> specifying the insert point.  Why do users need to know about "root" at
> all?  Is there some other special meaning attached to it?
> 
>                               -David

It doesn't correspond with any property of the DAG being matched. It's the entry point that the combine algorithm uses to begin an attempt to match the rule. In DAGCombine terms, it's the first node checked for the rule inside the call to DAGCombine::visit(SDNode *).