[PATCH] Handle tied sub-operands in AsmMatcherEmitter

Thu Mar 28 09:52:56 PDT 2013

Hi Ulrich,

Thanks for the explanation. Jim understands this code better than I do, so I'll let him comment.

I would like to see a solution that can be shared between ARM and PPC.

Thanks,
/jakob

On Mar 28, 2013, at 9:37 AM, Ulrich Weigand <ulrich.weigand at de.ibm.com> wrote:

> 
> Jakob,
> 
> this is another TableGen patch related to handling pre-inc instruction
> which I need as prerequisite for the PowerPC asm parser.
> 
> The problem this patch attempts to fix is the handling of register tie
> constraints in AsmMatcherEmitter, where one operand is tied to a
> sub-operand of another operand.  The typical scenario for this to happen is
> the tie between the "write-back" register of a pre-inc instruction, and the
> base register sub-operand of the memory address operand of that
> instruction.
> 
> Now from reading the code, it seems that this scenario wasn't ever supposed
> to be properly supported by AsmMatcherEmitter.  In fact, there seems to be
> code that at one time may have completely rejected such instructions, but
> now simply emits some warnings (and only when using debug mode).  However,
> even while those instructions may no longer be rejected, the code that is
> generated doesn't seem to be correct either.
> 
> One underlying cause of the problem is the way tied operands are handled:
> If operand I is tied to J (where I < J), then the actual operand is emitted
> when at location I, and when at location J a copy of what is already at
> location I is emitted.  To ensure this can always be done,
> buildInstructionOperandReference "canonicalizes" the assembler string such
> that the "destination" operand (with smaller number) is always visited
> first:
> 
>  // If the named operand is tied, canonicalize it to the untied operand.
>  // For example, something like:
>  //   (outs GPR:$dst), (ins GPR:$src)
>  // with an asmstring of
>  //   "inc $src"
>  // we want to canonicalize to:
>  //   "inc $dst"
>  // so that we know how to provide the $dst operand when filling in the
> result.
> 
> This has the effect, in the example given above, that when generating the
> output MI operand list, the assembler string operand originally found at
> the location indicated by "$src" in the pattern will be placed in the MI
> operand indicated by "$dst", and later on, in the MI operand indicated by
> "$src", a copy of "$dst" is placed.
> 
> However, if the destination is only *part* of the source operand (e.g. with
> a tie like "$wb = $mem.ptrreg"), this code currently also "canonicalizes"
> the full $mem to $wb.  This has the effect that in the output MI operand
> list, in the place of "$wb" *all* the sub-operands indicated by "$mem" are
> placed, and in the place of "$mem", only a single sub-operand is copied.
> This results in a wrong MI output list.
> 
> On the other hand, it is necessary to do this canonicalization, since the
> MI operand list is created by appending one operand after the other, so it
> is not possible to perform a "forward tie": when encoding "$wb", we'd have
> to leave a slot open, and only after we've encoded "$mem", we can then go
> back and fill in the copy.
> 
> 
> One backend currently affected by this problem is ARM, where they work
> around it by either providing custom matchers for pre-inc instructions, or
> else just handling the whole instruction differently via special asm-parser
> only instruction patterns.
> 
> To avoid having to duplicate hacks like this, I've come up with the
> attached patch, which fixes at least one part of the problem: if the
> multi-part operand can be handled piecewise by the asm parser, we can still
> do the "canonicalization" trick by moving up just the piece of the operand
> that's being tied.   This is implemented by updating the canonicalization
> logic in buildInstructionOperandReference, and modifying
> buildInstructionOperandReference to allow ties to work on sub-operands.
> 
> Now, those changes unfortunately affect some of the other cases that still
> cannot be implemented correctly, because the multi-part operand *cannot* be
> handled piecewise (as is the case in the ARM back-end).  In those, you now
> see errors of the form "Instruction ... has operand '$wb' that doesn't
> appear in asm string."  This is because $wb is no longer incorrectly
> canonicalized ...   If those instructions would have actually been used by
> the asm parser, they would have generated wrong MI operand lists.  However,
> they aren't (because the use custom matchers or are "shadowed" by
> asm-parser only alias instructions), those errors really are spurious.
> 
> One of the cases is simple to fix:  if we have a custom matcher, we don't
> need to run through buildInstructionOperandReference at all, since all the
> matching is done by the custom matcher anyway.  Thus we can just skip the
> routine and avoid spurious errors.
> 
> However, the other case (where the instruction is shadowed by a asm-parser
> only alias) cannot be detected automatically.  To avoid a build break, the
> patch demotes the FatalError to a simple Warning, which will now show up on
> those handfull of ARM instruction patterns.  To remove those warnings, we
> could either mark the instructions as "isCodeGenOnly" or implement proper
> custom matchers.  I'd prefer to leave this up to the ARM back-end
> maintainers, though ...
> 
> 
> Does this approach look reasonable?  I'd appreciate a review of the
> patch ...
> 
> 
> Thanks,
> Ulrich
> 
> 
> (See attached file: diff-llvm-tblgen-tie)<diff-llvm-tblgen-tie>