[llvm-dev] AVX 512 Assembly Code Generation

Tue Jun 20 17:39:40 PDT 2017

On 06/20/2017 07:21 PM, hameeza ahmed via llvm-dev wrote:
> Hello,
>
> I am using llvm  on my core i7 laptop which has no avx support.
>
> my goal is to generate avx512 code (loop vectorization) for  Knight 
> landing/skylake .
>
>
>
> my .c code is;
>
> int a[256], b[256], c[256];
> foo () {

void foo() {

> int i;
> for (i=0; i<256; i++) {
> a[i] = b[i] + c[i];
> }
> }
>
> i first generated its .ll file via clang
>
> clang -S  -emit-llvm test.c -o test.ll

Your problem is that vectorization happens in opt, not in llc. Telling 
llc that you wish to enable AVX-512 is not sufficient. In fact, if you run:

clang -S -o - test.c -march=knl -O3

you'll see AVX-512 vectorized code. If you want to run opt separately to 
generate the vectorized code, you need to tell it that it is targeting 
the KNL. Clang can add the necessary function attributes to do this. 
You'll also want to run clang with optimizations enabled so that it will 
generate IR that is intended to be optimized, even if you then disable 
the actual optimizaitons to get the pre-opt IR.

clang  -S -emit-llvm test.c -march=knl -O3 -mllvm -disable-llvm-optzns

then running opt as you have it below should produce the desired result.

Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better 
AVX-512 code than 3.9 did.

  -Hal

>
> then i optimized it;
>
> opt -S -O3 test.ll -o test_o3.ll
>
> then i used llc for code generation
>
> llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s
>
> llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s
>
>
> here is my generated code;
>
>
>
> .text
> .file"filer_o3.ll"
> .globlfoo
> .p2align4, 0x90
> .typefoo, at function
> foo:                                    # @foo
> .cfi_startproc
> # BB#0:                                 # %min.iters.checked
> pushq%rbp
> .Ltmp0:
> .cfi_def_cfa_offset 16
> .Ltmp1:
> .cfi_offset %rbp, -16
> movq%rsp, %rbp
> .Ltmp2:
> .cfi_def_cfa_register %rbp
> movq$-1024, %rax            # imm = 0xFC00
> .p2align4, 0x90
> .*LBB0_1:        # %vector.body*
> *        # =>This Inner Loop Header: Depth=1*
> *vmovdqa32c+1024(%rax), %xmm0*
> *vmovdqa32c+1040(%rax), %xmm1*
> *vpadddb+1024(%rax), %xmm0, %xmm0*
> *vpadddb+1040(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1024(%rax)*
> *vmovdqa32%xmm1, a+1040(%rax)*
> *vmovdqa32c+1056(%rax), %xmm0*
> *vmovdqa32c+1072(%rax), %xmm1*
> *vpadddb+1056(%rax), %xmm0, %xmm0*
> *vpadddb+1072(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1056(%rax)*
> *vmovdqa32%xmm1, a+1072(%rax)*
> *addq$64, %rax*
> *jne.LBB0_1*
> # BB#2:                                 # %middle.block
> popq%rbp
> retq
> .Lfunc_end0:
> .sizefoo, .Lfunc_end0-foo
> .cfi_endproc
>
> .typeb, at object               # @b
> .commb,1024,16
> .typec, at object               # @c
> .commc,1024,16
> .typea, at object               # @a
> .comma,1024,16
>
> .ident"clang version 3.9.0 (tags/RELEASE_390/final)"
> .section".note.GNU-stack","", at progbits
>
> in the generated code although there is use of vmov... instructions 
> but no zmm register? only xmm registers.
>
>
> Can you please specify where i am wrong. i have tried it several times 
> by different parameters but always get xmm registers.
>
>
> Thank You
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170620/d312ed7e/attachment.html>