[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Вадим Марковцев gmarkhor at gmail.com
Mon Feb 28 01:23:55 PST 2011


I've just revised the current LLVM trunk.

>>Adding separate "s" instructions is not the right thing to do.  We've been
trying hard to avoid adding those "twins".  The instructions that can
optionally set the condition >>codes have an "optional def" operand.  For
example, look at the "cc_out" operand in the "sI" class defined in
ARMInstrFormats.td.  If that operand is set to the CPSR >>register, then the
instruction becomes the "s" variant.

Alright, but everything is not so shiny as one may expect. For example, when
I set "mov" instruction to define CPSR, generated assembler is still "mov",
not "movs". "movs" is absolutely correct instruction which sets CPSR. The
same operation on "add" brings the desired effect. So, if one should go the
way you propose instead of adding separate instructions to tablegen, what he
or she has to modify in LLVM code to resolve such issues? There are lots of
similar instructions unsupported by LLVM which SURE HAVE a suffixed twin.

>>There are some existing peephole optimizations to make use of this, but
there are some unresolved issues as well.  Do you have some example
testcases that show where >>we're missing opportunities?

Oh yeah. Consider the following existing peephole optimization:
PeepholeOptimizer.cpp->PeepholeOptimizer::OptimizeCmpInstr->ARMBaseInstrInfo::OptimizeCompareInstr.

  case ARM::ADDri:
  case ARM::ANDri:
  case ARM::t2ANDri:
  case ARM::SUBri:
  case ARM::t2ADDri:
  case ARM::t2SUBri:
    // Toggle the optional operand to CPSR.
    MI->getOperand(5).setReg(ARM::CPSR);
    MI->getOperand(5).setIsDef(true);
    CmpInstr->eraseFromParent();
    return true;

...and that's all, however this switch should be giant (88 instructions
instead of 6 can be supported so far). Yet another question unclear to me is
what the origin of the comment above

// Set the "zero" bit in CPSR.

is. Why not also "negative"?
Moreover, that peephole thing particularly can be dramatically improved with
some advanced analysis.
For example, consider the following program:

#include <stdio.h>

int main()
{
    srand(time(NULL));
    int x, y;
    x = rand();
    y = rand();
    int z = x * y;
    if (z == 0)
    {
     printf("Zero");
    }
    z = x|y;
    if (z > 0)
    {
     printf("Greater");
    }
    else
    {
     printf("Smaller");
    }
return 0;
}

It compiles to
.syntax unified
.cpu cortex-a8
 .eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
 .eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
 .eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
 .eabi_attribute 25, 1
.file "test.bc"
 .text
.globl main
 .align 2
.type main,%function
main:                                   @ @main
@ BB#0:                                 @ %entry
push {r4, r5, r11, lr}
 mov r0, #0
bl time
 bl srand
bl rand
 mov r4, r0
bl rand
 mov r5, r0
mul r0, r5, r4
        cmp r0, #0
bne .LBB0_2
@ BB#1:                                 @ %bb
movw r0, :lower16:.L.str
movt r0, :upper16:.L.str
 bl printf
.LBB0_2:                                @ %bb1
orr r0, r5, r4
 cmp r0, #1
blt .LBB0_5
@ BB#3:                                 @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4:                                @ %bb2
bl printf
mov r0, #0
 ldmia sp!, {r4, r5, r11, pc}
.LBB0_5:                                @ %bb3
 movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
 b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main

.type .L.str,%object          @ @.str
.section .rodata,"a",%progbits
 .align 2
.L.str:
.asciz "Zero"
 .size .L.str, 5

.type .L.str1,%object         @ @.str1
 .align 2
.L.str1:
.asciz "Greater"
 .size .L.str1, 8

.type .L.str2,%object         @ @.str2
 .align 2
.L.str2:
.asciz "Smaller"
 .size .L.str2, 8

At the same time, my optimization produces

.syntax unified
.cpu cortex-a8
 .eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
 .eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
 .eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
 .eabi_attribute 25, 1
.file "test.bc"
 .text
.globl main
 .align 2
.type main,%function
main:                                   @ @main
@ BB#0:                                 @ %entry
push {r4, r5, r11, lr}
 mov r0, #0
bl time
 bl srand
bl rand
 mov r4, r0
bl rand
 mov r5, r0
muls r0, r5, r4
 bne .LBB0_2
@ BB#1:                                 @ %bb
movw r0, :lower16:.L.str
 movt r0, :upper16:.L.str
bl printf
.LBB0_2:                                @ %bb1
orrs r0, r5, r4
ble .LBB0_5
@ BB#3:                                 @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4:                                @ %bb2
bl printf
mov r0, #0
 ldmia sp!, {r4, r5, r11, pc}
.LBB0_5:                                @ %bb3
 movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
 b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main

.type .L.str,%object          @ @.str
.section .rodata,"a",%progbits
 .align 2
.L.str:
.asciz "Zero"
 .size .L.str, 5

.type .L.str1,%object         @ @.str1
 .align 2
.L.str1:
.asciz "Greater"
 .size .L.str1, 8

.type .L.str2,%object         @ @.str2
 .align 2
.L.str2:
.asciz "Smaller"
 .size .L.str2, 8

You should pay attention to "muls"  instead of "mul" (lack of support) and
"orrs" instead of "orr" (advanced analysis).

18 февраля 2011 г. 21:49 пользователь Bob Wilson <bob.wilson at apple.com>написал:

>
> On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote:
>
> > Hello everyone,
> >
> > I've added the "S" suffixed versions of ARM and Thumb2 instructions to
> tablegen. Those are, for example, "movs" or "muls".
> > Of course, some instructions have already had their twins, such as
> add/adds, and I leaved them untouched.
>
> Adding separate "s" instructions is not the right thing to do.  We've been
> trying hard to avoid adding those "twins".  The instructions that can
> optionally set the condition codes have an "optional def" operand.  For
> example, look at the "cc_out" operand in the "sI" class defined in
> ARMInstrFormats.td.  If that operand is set to the CPSR register, then the
> instruction becomes the "s" variant.
>
> There are some existing peephole optimizations to make use of this, but
> there are some unresolved issues as well.  Do you have some example
> testcases that show where we're missing opportunities?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110228/79217629/attachment.html>


More information about the llvm-dev mailing list