[LLVMdev] string input for the integrated assembler

Steve King steve at metrokings.com
Wed Mar 18 11:57:27 PDT 2015

On Wed, Mar 18, 2015 at 10:50 AM, Joerg Sonnenberger
<joerg at britannica.bec.de> wrote:
>> I probably used the term 'expansion' incorrectly.  Pseudos go 1:1 into
>> .s files, then MCTargetAsmParser does its job.  This class nicely
>> consolidates tblgen's auto-generated operand fitting logic, which for
>> me is quite a blob of code.  Should use of the integrated assembler
>> require targets to pick all machine instructions some other way?  If
>> the answer should be no, then handling pseudos via their AsmString
>> feels like a tidy answer.
> Please stop talking abouot "AsmString", that really makes no sense. It
> is really hard to help you if you go back to an approach that is
> difficult to understand and from what we can understand, completely
> wrong. It also doesn't help that the questions so far are pretty much
> without meat. E.g. no example of what your "high-level" assembler
> mnemonic looks like and how the resulting instructions look like.

OK.  AsmString is the tblgen variable holding the string
representation of the instruction.  This doesn't exist in an MCInst
and I conflated the two.  Sorry about that.

We can use X86 for a meaty example that is close to my target.  In
X86InstrInfo::optimizeCompareInstr() we see code like this:

    case X86::SUB64ri32: NewOpcode = X86::CMP64ri32; break;
    case X86::SUB64ri8:  NewOpcode = X86::CMP64ri8;  break;
    case X86::SUB32ri:   NewOpcode = X86::CMP32ri;   break;
    case X86::SUB32ri8:  NewOpcode = X86::CMP32ri8;  break;
    case X86::SUB16ri:   NewOpcode = X86::CMP16ri;   break;
    case X86::SUB16ri8:  NewOpcode = X86::CMP16ri8;  break;
    case X86::SUB8ri:    NewOpcode = X86::CMP8ri;    break;

Here, the compiler must distinguish SUB "ri" from the more compact,
but logically redundant SUB "ri8".  That, multiplied by the number of
operand widths and that again multiplied by the number of compare-like

My target has *many* more choices than just "ri" and "ri8" and checks
like this switch statement would explode.  At the time of
optimizeCompareInstr(), nitpicking all the various ways to encode an
immediate value for an instruction would be very painful.  It's enough
to know an immediate form exists as represented by a pseudo.  I get to
use something like this:

    case FOO::SUB64ri: NewOpcode = FOO::CMP64ri; break;
    case FOO::SUB32ri:   NewOpcode = FOO::CMP32ri;   break;
    case FOO::SUB16ri:   NewOpcode = FOO::CMP16ri;   break;
    case FOO::SUB8ri:    NewOpcode = FOO::CMP8ri;    break;

Into the assembly output file, the pseudo emits something like:  "cmpl
$0x80,%eax" with no clue of encoding.

Given the .s file, only the assembly parser worries if $0x80 is best
represented as an unsigned imm8, a signed imm16, imm32, power-of-2,
shifted nibble, special choices for given register destinations, and
so on.

Hopefully this is this more clear?  I will back off from suggesting
how this use of pseudos could be accommodated with the integrated


More information about the llvm-dev mailing list