[llvm-dev] 回复: assembly code for array iteration generated by llvm is much slower than gcc

Wed Apr 29 00:24:20 PDT 2020

Hi Sam£¬
   There's another case that clang riscv backend generate 33% more code than gcc in a loop code .

C code:
float max(float * maxval_it, int len)
{
    float maxval = 0;
    float * end = maxval_it + len;
    while( maxval_it < end) 
    {
        if (*maxval_it > maxval)
        {
            maxval = *maxval_it;
        }
    }
    return maxval;
}

Compile command:
riscv32-unkown-elf-g++ -nostartfiles -nostdlib -O2 -march=rv32imf -mabi=ilp32f -fno-builtin -S perf.c -o perf.g++
clang++ -O2 ¨Ctarget=riscv32 -march=rv32img -mabi=ilp32f -nostdlib -fno-builtin -S perf.c -o perf.lang

the gcc version is 7.2.0
the llvm version is 10.0.0

the code of loop generate by gcc:
.L5:
    flw    fa5, 0(a0)
    addi   a0,a0,4
    fgt.s   a5,fa5,fa0
    beqz   a5, .L3
    fmv.s   fa0, fa5
.L3:
    bgtu   a1, a0, .L5

the code of loop generate by clang riscv backend:
.LBB0_2:
    addi a0, a0, 4
    fmv.s  ft0, fa0
    bgeu  a0, a1, .LBB0_5
.LBB0_3:
    flw    fa0, 0(a0)
    flt.s    a2, ft0, fa0
    bnez   a2, .LBB0_2
    fmv.s   fa0, ft0
    j       .LBB0_2

Thanks~
Lori
-----ÓÊ¼þÔ¼þ-----
·¢¼þÈË: Lori Yao Yu 
·¢ËÍÊ±¼ä: 2020Äê4ÔÂ28ÈÕ 11:00
ÊÕ¼þÈË: Sam Elliott <selliott at lowrisc.org>
³ËÍ: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Ö÷Ìâ: »Ø¸´: [llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

Hi Sam,

Yes, it is riscv assembly code.  The test code is show bellow. You can copy the code to a c file named perf.c, then you can compile perf.c using the compile command bellow.  
We can see than gcc prefer to use pointer to iterate the array,  but llvm perfer to use index to iterate the array.  So llvm generate more codes to calculate the memory address of an array element from the index.

Test C code:

//perf.c

int func(int w1, int w2, int *b, int *c) {
   int wstart = 0;
   int i = 0;
   int j = 0;
   int sum = 0;
   int wend = 0;
   int dst_idx = 0;
   int dst_idx2 = 0;
   for (i = 0; i < w2; i++) {
        wstart = i * w1;
        wend = i / w1;
        sum = c[wstart];
        for (j = wstart + 1; j < wend; j++) {
           sum += c[j * w2];
           sum += c[j * w1];
        }
       dst_idx = w1 * i + w2;
       dst_idx2 = w2 * i + w1;
       b[dst_idx] = sum;
       b[dst_idx2] = sum/2;
    }
}

Compile command:
riscv32-unkown-elf-g++ -nostartfiles -nostdlib -O2 -march=rv32imf -mabi=ilp32f -fno-builtin -S perf.c -o perf.g++
clang++ -O2 ¨Ctarget=riscv32 -march=rv32img -mabi=ilp32f -nostdlib -fno-builtin -S perf.c -o perf.lang

the gcc version is 7.2.0
the llvm version is 10.0.0

thanks!~
Lori

-----ÓÊ¼þÔ¼þ-----
·¢¼þÈË: Sam Elliott <selliott at lowrisc.org> 
·¢ËÍÊ±¼ä: 2020Äê4ÔÂ27ÈÕ 21:36
ÊÕ¼þÈË: Lori Yao Yu <loriyu at panyi.ai>
³ËÍ: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Ö÷Ìâ: Re: [llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

Hi,

Am I right in thinking that this is RISC-V assembly?

Please can you provide a testcase (a C file, or LLVM IR) that we can use to diagnose this issue further? It would also be useful to know what architecture (including extensions) and other compiler flags you are using.

We know that the assembly that LLVM generates for RISC-V is not always the most efficient, and we're working on this issue at the moment. We would welcome more testcases.

Sam

> On 26 Apr 2020, at 4:37 am, Lori Yao Yu via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> <image002.jpg>

--
Sam Elliott
Software Team Lead
Senior Software Developer - LLVM and OpenTitan lowRISC CIC