[llvm-dev] AVX 512 Assembly Code Generation issues
hameeza ahmed via llvm-dev
llvm-dev at lists.llvm.org
Wed Jun 21 06:16:38 PDT 2017
when i generate code with 72 loop iterations.
the compiler generates code with using avx512 zmm operations 4 times
(16x4=64) and remaining 8 iterations are handled by routine mov operations
with EAX register. wouldn't it be better if it uses ymm for remaining 8
iterations as it does when iteration count is between 8 and 15. same for
xmm and so on.
please correct me if i am wrong.
Thank You
On Jun 21, 2017 12:21 AM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:
> Hello,
> I am using llvm on my core i7 laptop which has no avx support.
> my goal is to generate avx512 code (loop vectorization) for Knight
> landing/skylake .
> my .c code is;
> int a[256], b[256], c[256];
> foo () {
> int i;
> for (i=0; i<256; i++) {
> a[i] = b[i] + c[i];
> }
> }
> i first generated its .ll file via clang
> clang -S -emit-llvm test.c -o test.ll
> then i optimized it;
> opt -S -O3 test.ll -o test_o3.ll
> then i used llc for code generation
> llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s
> llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s
> here is my generated code;
> .text
> .file "filer_o3.ll"
> .globl foo
> .p2align 4, 0x90
> .type foo, at function
> foo: # @foo
> .cfi_startproc
> # BB#0: # %min.iters.checked
> pushq %rbp
> .Ltmp0:
> .cfi_def_cfa_offset 16
> .Ltmp1:
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> .Ltmp2:
> .cfi_def_cfa_register %rbp
> movq $-1024, %rax # imm = 0xFC00
> .p2align 4, 0x90
> .*LBB0_1: # %vector.body*
> * # =>This Inner Loop Header:
> Depth=1*
> * vmovdqa32 c+1024(%rax), %xmm0*
> * vmovdqa32 c+1040(%rax), %xmm1*
> * vpaddd b+1024(%rax), %xmm0, %xmm0*
> * vpaddd b+1040(%rax), %xmm1, %xmm1*
> * vmovdqa32 %xmm0, a+1024(%rax)*
> * vmovdqa32 %xmm1, a+1040(%rax)*
> * vmovdqa32 c+1056(%rax), %xmm0*
> * vmovdqa32 c+1072(%rax), %xmm1*
> * vpaddd b+1056(%rax), %xmm0, %xmm0*
> * vpaddd b+1072(%rax), %xmm1, %xmm1*
> * vmovdqa32 %xmm0, a+1056(%rax)*
> * vmovdqa32 %xmm1, a+1072(%rax)*
> * addq $64, %rax*
> * jne .LBB0_1*
> # BB#2: # %middle.block
> popq %rbp
> retq
> .Lfunc_end0:
> .size foo, .Lfunc_end0-foo
> .cfi_endproc
> .type b, at object # @b
> .comm b,1024,16
> .type c, at object # @c
> .comm c,1024,16
> .type a, at object # @a
> .comm a,1024,16
> .ident "clang version 3.9.0 (tags/RELEASE_390/final)"
> .section ".note.GNU-stack","", at progbits
> in the generated code although there is use of vmov... instructions but no
> zmm register? only xmm registers.
> Can you please specify where i am wrong. i have tried it several times by
> different parameters but always get xmm registers.
> Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/4c202ae5/attachment.html>
More information about the llvm-dev
mailing list