[llvm-dev] A possible bug in the assembly parser for ARM

章明 via llvm-dev llvm-dev at lists.llvm.org
Sat Jul 29 21:25:06 PDT 2017


Dear developers,




As the following code snippet copied from lib/MC/MCParser/AsmParser.cpp shows, when parsing a label, AsmParser::parseStatement calls the onLabelParsed method of a target parser after emitting the label:




    // Emit the label.
    if (!getTargetParser().isParsingInlineAsm())
      Out.EmitLabel(Sym, IDLoc);

    // If we are generating dwarf for assembly source files then gather the
    // info to make a dwarf label entry for this label if needed.
    if (getContext().getGenDwarfForAssembly())
      MCGenDwarfLabelEntry::Make(Sym, &getStreamer(), getSourceManager(),
                                 IDLoc);

    getTargetParser().onLabelParsed(Sym);




For ARM, calling onLabelParsed after emitting the label seems to be a bug.




If I understand it correctly, ARMAsmParser::onLabelParsed (defined in lib/Target/ARM/AsmParser/ARMAsmParser.cpp) performs two tasks:

1) Complete the current implicit IT block (if one is open for new conditional instructions) BEFORE the label, so that the IT block cannot be entered from the middle of it.

2) Emit a .thumb_func directive, if the label is the first label following a previously parsed .thumb_func directive without an optional symbol.

Considering the tasks above, calling onLabelParsed after the label is emitted leads to two types of errors in the generated code:

1) Instructions of an IT block BEFORE the label may be incorrectly emitted AFTER the label.

2) .thumb_func directives, which should be emitted BEFORE the corresponding function symbols, are emitted AFTER the function symbols.




I tested llvm-mc with the following assembly code:




    .text
    .syntax unified
    .p2align 1
    .code   16
    .globl  f1
    .globl  f2
    .thumb_func
f1:
    CMP   r0, #10

    .thumb_func

    MOVPL r0, #0

f2:
    MOVS  r1, #0
.Ltmp:
    CMP   r0, #0
    ITTT  PL
    ADDPL r1, r1, r0
    SUBPL r0, r0, #1
    BPL   .Ltmp
    MOV   r0, r1
    BX    lr

    .end




The generated assembly code was as follows:




    .text
    .p2align    1
    .code    16
    .globl    f1
    .globl    f2
f1:
    .thumb_func
    cmp    r0, #10



f2:
    it    pl
    movpl    r0, #0
    .thumb_func
    movs    r1, #0
.Ltmp:
    cmp    r0, #0
    ittt    pl
    addpl    r1, r1, r0
    subpl    r0, r0, #1
    bpl    .Ltmp
    mov    r0, r1
    bx    lr




By comparing the generated assembly code with the original assembly code, it can be seen that both types of errors are present in the generated code.


I tested llvm-mc with the following command line:




llvm-mc -arch=thumb -filetype=asm -mattr=+soft-float,-neon,-crypto,+strict-align -mcpu=cortex-m3 -n -triple=armv7m-none-none-eabi -o=it-block-roundtrip.s it-block.S -arm-implicit-it=always




where it-block.S is the original assembly file and it-block-roundtrip.s is the generated assembly file.







Ming Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170730/7117a4be/attachment.html>


More information about the llvm-dev mailing list