[PATCH] Handle tied sub-operands in AsmMatcherEmitter

Thu Mar 28 09:37:03 PDT 2013

Jakob,

this is another TableGen patch related to handling pre-inc instruction
which I need as prerequisite for the PowerPC asm parser.

The problem this patch attempts to fix is the handling of register tie
constraints in AsmMatcherEmitter, where one operand is tied to a
sub-operand of another operand.  The typical scenario for this to happen is
the tie between the "write-back" register of a pre-inc instruction, and the
base register sub-operand of the memory address operand of that
instruction.

Now from reading the code, it seems that this scenario wasn't ever supposed
to be properly supported by AsmMatcherEmitter.  In fact, there seems to be
code that at one time may have completely rejected such instructions, but
now simply emits some warnings (and only when using debug mode).  However,
even while those instructions may no longer be rejected, the code that is
generated doesn't seem to be correct either.

One underlying cause of the problem is the way tied operands are handled:
If operand I is tied to J (where I < J), then the actual operand is emitted
when at location I, and when at location J a copy of what is already at
location I is emitted.  To ensure this can always be done,
buildInstructionOperandReference "canonicalizes" the assembler string such
that the "destination" operand (with smaller number) is always visited
first:

  // If the named operand is tied, canonicalize it to the untied operand.
  // For example, something like:
  //   (outs GPR:$dst), (ins GPR:$src)
  // with an asmstring of
  //   "inc $src"
  // we want to canonicalize to:
  //   "inc $dst"
  // so that we know how to provide the $dst operand when filling in the
result.

This has the effect, in the example given above, that when generating the
output MI operand list, the assembler string operand originally found at
the location indicated by "$src" in the pattern will be placed in the MI
operand indicated by "$dst", and later on, in the MI operand indicated by
"$src", a copy of "$dst" is placed.

However, if the destination is only *part* of the source operand (e.g. with
a tie like "$wb = $mem.ptrreg"), this code currently also "canonicalizes"
the full $mem to $wb.  This has the effect that in the output MI operand
list, in the place of "$wb" *all* the sub-operands indicated by "$mem" are
placed, and in the place of "$mem", only a single sub-operand is copied.
This results in a wrong MI output list.

On the other hand, it is necessary to do this canonicalization, since the
MI operand list is created by appending one operand after the other, so it
is not possible to perform a "forward tie": when encoding "$wb", we'd have
to leave a slot open, and only after we've encoded "$mem", we can then go
back and fill in the copy.

One backend currently affected by this problem is ARM, where they work
around it by either providing custom matchers for pre-inc instructions, or
else just handling the whole instruction differently via special asm-parser
only instruction patterns.

To avoid having to duplicate hacks like this, I've come up with the
attached patch, which fixes at least one part of the problem: if the
multi-part operand can be handled piecewise by the asm parser, we can still
do the "canonicalization" trick by moving up just the piece of the operand
that's being tied.   This is implemented by updating the canonicalization
logic in buildInstructionOperandReference, and modifying
buildInstructionOperandReference to allow ties to work on sub-operands.

Now, those changes unfortunately affect some of the other cases that still
cannot be implemented correctly, because the multi-part operand *cannot* be
handled piecewise (as is the case in the ARM back-end).  In those, you now
see errors of the form "Instruction ... has operand '$wb' that doesn't
appear in asm string."  This is because $wb is no longer incorrectly
canonicalized ...   If those instructions would have actually been used by
the asm parser, they would have generated wrong MI operand lists.  However,
they aren't (because the use custom matchers or are "shadowed" by
asm-parser only alias instructions), those errors really are spurious.

One of the cases is simple to fix:  if we have a custom matcher, we don't
need to run through buildInstructionOperandReference at all, since all the
matching is done by the custom matcher anyway.  Thus we can just skip the
routine and avoid spurious errors.

However, the other case (where the instruction is shadowed by a asm-parser
only alias) cannot be detected automatically.  To avoid a build break, the
patch demotes the FatalError to a simple Warning, which will now show up on
those handfull of ARM instruction patterns.  To remove those warnings, we
could either mark the instructions as "isCodeGenOnly" or implement proper
custom matchers.  I'd prefer to leave this up to the ARM back-end
maintainers, though ...

Does this approach look reasonable?  I'd appreciate a review of the
patch ...

Thanks,
Ulrich

(See attached file: diff-llvm-tblgen-tie)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-tblgen-tie
Type: application/octet-stream
Size: 6799 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130328/c8593a3f/attachment.obj>