[PATCH] Handle tied sub-operands in AsmMatcherEmitter

Wed Apr 24 11:59:46 PDT 2013

Hi Jim,

> Thank you for looking at this. Apologies again for taking
> unjustifiably long to get back to you. This is really good stuff and
> I very much want to see this go in. I like it enough I’m going to
> try to talk you into doing even more work on improving this code. ;)
>
> Fair warning up front: You’re digging into some pretty fundamental
> problems in how the assemblers and code generators like to view
> instructions. As you correctly observe, we don’t really have very
> good solutions in place right now, just a bunch of hacks that get us
> where we need to go. I’d *love* to fix that, and this patch is a
> step in that direction. So I’m going to give a bit of a brain-dump
> here about a few different things. Adding llvmdev as well, since
> we’re veering into more high level design territory

Thanks for the detailed reply, this really helps me understand
the "big picture" better!

>> One backend currently affected by this problem is ARM, where they work
>> around it by either providing custom matchers for pre-inc instructions,
or
>> else just handling the whole instruction differently via special
asm-parser
>> only instruction patterns.
>
> You mean custom converters, not custom matchers, yes? The matcher is
> something completely different. I’m assuming you mean converters for
> the rest. If that’s not the case, please let me know and I’ll back
> up and re-think.

Right, custom converters (i.e. AsmMatchConverter).  Sorry for the
confusion with terminology ...

> There are two core issues. First, the tied operands are an artifact
> of the way we do instruction selection for the compiler. They’re how
> we represent read-modify-write operands, basically. The assembler
> shouldn’t have to know or care about them at all. Specifically, they
> shouldn’t even be represented at all as an MCOperand. Rather, only
> the actual operand referenced by the instruction encoding should be
> an operand, and it should have an instruction-description notation
> that the operand is RMW. This means that the MI representation and
> the MCInst representations of the instructions will no longer have a
> one-to-one operand correspondence, so there will need to be logic to
> map between them for things like the MC lowering pass (the ill named
> AsmPrinter), the TargetInstrInfo layer which inherits from
> MCInstrInfo, and probably others. This is usually, in effect, what
> many of the ARM assembler aliases of the load/store instructions are
> doing manually. We just want to automate it.

I see.  This could actually make things quite a bit simpler.

If I understand your point correctly, at the MCInst level, we don't
want multiple copies of the tied operand.  In fact, would it make
sense to say that at the MCInst level, the list of operands we want
should correspond exactly to those named in the AsmString?  The
rationale would be that any operand not named in the AsmString is
obviously not needed to emit the instruction as string, and thus
it clearly ought not be needed when emitting (or matching!) the
instruction in any other way either ...

Right now, we do have multiple copies, and the common TableGen
code attempts to fill them in (with copies of the operand value).
However, the ARM custom converters mentioned above actually do
not bother: instead, they simply fill those slots that do not
correspond to named operands in the AsmString with dummy entries
(constant 0).  I can see now where this actually makes sense given
that those MCOperand entries never should be used for anything.

As a first step towards the goal of removing the duplicate operands
from MCInst, would it make sense to move the ARM approach to common
code: that is, have common code simply emit dummy entries for all
operand slots not named in the AsmString?

This would have several advantages:
- TableGen AsmMatcherEmitter would not have to track matching
  constraints at all, simplifying that code a bit
- it would work even in those cases where the back-end needs a
  custom matcher for multi-operand operands, where my current
  patch does not work
- most (all?) of the ARM custom converters could simply be
  removed, since the common code would do the same thing

At some later point, it should then be simple to completely remove
those dummy operands (and deal with issues like operand numbers no
longer matching those in MachineInstruction).

I've tried to put together a patch to explore this direction.  As it
turns out, even this first step is not trivial to implement, since
back-ends do tend to use both operand slots of a tied operand on
occasion; this would need to be cleaned up first.  However, if as
a first step towards the first step we emit dummy operands only for
tied *sub*-operands (i.e. the case that is currently completely
broken in common code anyway), everything seems to work.

The attached patch implements this; it is in fact a somewhat
simplified version of the orginial patch in this email thread.
Note that there is no longer any warning emitted if a named
operand is not found in the AsmString; instead we simply get
the dummy operand.

> Second, many (most?) of the time, the assembly parser would really
> like to talk about individual MCOperands only, not complex operands
> like memory addressing modes.
[snip]

I agree that the approach you outline here looks reasonable,
and may be simplify parsing/matching complex operands.  This
does look like a somewhat independent problem, however, and
I'd prefer to possibly addressing it as a second step later
on ...

> The instructions that have an AsmParser alias should definitely be
> marked isCodeGenOnly. It’s a (probably harmless at the moment) bug
> if they’re not. Please go ahead and update them. That way the build
> will stay warning-clean after this goes in. I suggest doing that
> separately as an incremental patch before landing this one since
> it’s preparatory work, rather than being strictly part of this patch.

Huh, that causes failures in the disassembler testsuite.  Apparently,
the main patterns are used for code generation *and* the disassembler,
and separate patterns are used for the asm parser only.  I didn't see
a way to mark a pattern for both codegen and disassembler, but not
the asm parser ...

> It’s reasonable as a step towards improving things, yes. If you’re
> really interested in this area, I’d love to see some of the more
> aggressive refactoring talked about above get some traction.

Thanks again for the review!

Do you agree with the approach in this updated patch?  If so, would
it be OK to commit as first step?   Thanks!

(Or else, if you'd prefer me to check in the original version of
the patch, I'd certainly be happy to do so as well.)

(See attached file: diff-llvm-tblgen-tie)

Bye,
Ulrich
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-tblgen-tie
Type: application/octet-stream
Size: 4676 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130424/2798dce0/attachment.obj>