[PATCH] Handle tied sub-operands in AsmMatcherEmitter

Fri Apr 26 15:54:53 PDT 2013

On Apr 24, 2013, at 11:59 AM, Ulrich Weigand <Ulrich.Weigand at de.ibm.com> wrote:

> Hi Jim,
> 
>> Thank you for looking at this. Apologies again for taking
>> unjustifiably long to get back to you. This is really good stuff and
>> I very much want to see this go in. I like it enough I’m going to
>> try to talk you into doing even more work on improving this code. ;)
>> 
>> Fair warning up front: You’re digging into some pretty fundamental
>> problems in how the assemblers and code generators like to view
>> instructions. As you correctly observe, we don’t really have very
>> good solutions in place right now, just a bunch of hacks that get us
>> where we need to go. I’d *love* to fix that, and this patch is a
>> step in that direction. So I’m going to give a bit of a brain-dump
>> here about a few different things. Adding llvmdev as well, since
>> we’re veering into more high level design territory
> 
> Thanks for the detailed reply, this really helps me understand
> the "big picture" better!
> 
>>> One backend currently affected by this problem is ARM, where they work
>>> around it by either providing custom matchers for pre-inc instructions,
> or
>>> else just handling the whole instruction differently via special
> asm-parser
>>> only instruction patterns.
>> 
>> You mean custom converters, not custom matchers, yes? The matcher is
>> something completely different. I’m assuming you mean converters for
>> the rest. If that’s not the case, please let me know and I’ll back
>> up and re-think.
> 
> Right, custom converters (i.e. AsmMatchConverter).  Sorry for the
> confusion with terminology ...
> 
>> There are two core issues. First, the tied operands are an artifact
>> of the way we do instruction selection for the compiler. They’re how
>> we represent read-modify-write operands, basically. The assembler
>> shouldn’t have to know or care about them at all. Specifically, they
>> shouldn’t even be represented at all as an MCOperand. Rather, only
>> the actual operand referenced by the instruction encoding should be
>> an operand, and it should have an instruction-description notation
>> that the operand is RMW. This means that the MI representation and
>> the MCInst representations of the instructions will no longer have a
>> one-to-one operand correspondence, so there will need to be logic to
>> map between them for things like the MC lowering pass (the ill named
>> AsmPrinter), the TargetInstrInfo layer which inherits from
>> MCInstrInfo, and probably others. This is usually, in effect, what
>> many of the ARM assembler aliases of the load/store instructions are
>> doing manually. We just want to automate it.
> 
> I see.  This could actually make things quite a bit simpler.
> 
> If I understand your point correctly, at the MCInst level, we don't
> want multiple copies of the tied operand.  In fact, would it make
> sense to say that at the MCInst level, the list of operands we want
> should correspond exactly to those named in the AsmString?  The
> rationale would be that any operand not named in the AsmString is
> obviously not needed to emit the instruction as string, and thus
> it clearly ought not be needed when emitting (or matching!) the
> instruction in any other way either …

Yep, exactly right. In practice, there may need to be exceptions made, but hopefully we’d be able to figure out better ways to model those so we could make this an invariant.

> 
> Right now, we do have multiple copies, and the common TableGen
> code attempts to fill them in (with copies of the operand value).
> However, the ARM custom converters mentioned above actually do
> not bother: instead, they simply fill those slots that do not
> correspond to named operands in the AsmString with dummy entries
> (constant 0).  I can see now where this actually makes sense given
> that those MCOperand entries never should be used for anything.
> 
> As a first step towards the goal of removing the duplicate operands
> from MCInst, would it make sense to move the ARM approach to common
> code: that is, have common code simply emit dummy entries for all
> operand slots not named in the AsmString?

Sure. There’s a bit of a caveat that the original motivation for this patch originally intersects here. The first operand in the list isn’t necessarily the “real” one. We can probably fix this by using your suggestion above and consider whichever one of the two is used in the asm string as the real one. If both are used, make that an error and fix the strings that do that.

> 
> This would have several advantages:
> - TableGen AsmMatcherEmitter would not have to track matching
>  constraints at all, simplifying that code a bit
> - it would work even in those cases where the back-end needs a
>  custom matcher for multi-operand operands, where my current
>  patch does not work
> - most (all?) of the ARM custom converters could simply be
>  removed, since the common code would do the same thing
> 
> At some later point, it should then be simple to completely remove
> those dummy operands (and deal with issues like operand numbers no
> longer matching those in MachineInstruction).
> 

Yep. Sounds great.

> 
> I've tried to put together a patch to explore this direction.  As it
> turns out, even this first step is not trivial to implement, since
> back-ends do tend to use both operand slots of a tied operand on
> occasion; this would need to be cleaned up first.  However, if as
> a first step towards the first step we emit dummy operands only for
> tied *sub*-operands (i.e. the case that is currently completely
> broken in common code anyway), everything seems to work.

OK.

> 
> The attached patch implements this; it is in fact a somewhat
> simplified version of the orginial patch in this email thread.
> Note that there is no longer any warning emitted if a named
> operand is not found in the AsmString; instead we simply get
> the dummy operand.
> 
> 
>> Second, many (most?) of the time, the assembly parser would really
>> like to talk about individual MCOperands only, not complex operands
>> like memory addressing modes.
> [snip]
> 
> I agree that the approach you outline here looks reasonable,
> and may be simplify parsing/matching complex operands.  This
> does look like a somewhat independent problem, however, and
> I'd prefer to possibly addressing it as a second step later
> on …

Yes, independent. Just related, and we were on the general topic anyway, so… :)

> 
> 
>> The instructions that have an AsmParser alias should definitely be
>> marked isCodeGenOnly. It’s a (probably harmless at the moment) bug
>> if they’re not. Please go ahead and update them. That way the build
>> will stay warning-clean after this goes in. I suggest doing that
>> separately as an incremental patch before landing this one since
>> it’s preparatory work, rather than being strictly part of this patch.
> 
> Huh, that causes failures in the disassembler testsuite.  Apparently,
> the main patterns are used for code generation *and* the disassembler,
> and separate patterns are used for the asm parser only.  I didn't see
> a way to mark a pattern for both codegen and disassembler, but not
> the asm parser …

Oh drat. I always forget about the disassembler in this things. Sounds like we somewhat define away this issue w/ the above that removes the need for the new warning?

Thinking about this more, I’m not sure I was correct in stating these need to be isCodeGenOnly, honestly. These instructions are a bit of a mess. This patch makes at least some steps towards fixing that, which is great.

>> It’s reasonable as a step towards improving things, yes. If you’re
>> really interested in this area, I’d love to see some of the more
>> aggressive refactoring talked about above get some traction.
> 
> Thanks again for the review!
> 
> 
> Do you agree with the approach in this updated patch?  If so, would
> it be OK to commit as first step?   Thanks!

Yep, this looks great!

Thanks again for doing this.

-Jim

> 
> (Or else, if you'd prefer me to check in the original version of
> the patch, I'd certainly be happy to do so as well.)
> 
> (See attached file: diff-llvm-tblgen-tie)
> 
> Bye,
> Ulrich<diff-llvm-tblgen-tie>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130426/f29040d2/attachment.html>