[llvm-dev] AVX 512 Assembly Code Generation
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 20 17:39:40 PDT 2017
On 06/20/2017 07:21 PM, hameeza ahmed via llvm-dev wrote:
> Hello,
>
> I am using llvm on my core i7 laptop which has no avx support.
>
> my goal is to generate avx512 code (loop vectorization) for Knight
> landing/skylake .
>
>
>
> my .c code is;
>
> int a[256], b[256], c[256];
> foo () {
void foo() {
> int i;
> for (i=0; i<256; i++) {
> a[i] = b[i] + c[i];
> }
> }
>
> i first generated its .ll file via clang
>
> clang -S -emit-llvm test.c -o test.ll
Your problem is that vectorization happens in opt, not in llc. Telling
llc that you wish to enable AVX-512 is not sufficient. In fact, if you run:
clang -S -o - test.c -march=knl -O3
you'll see AVX-512 vectorized code. If you want to run opt separately to
generate the vectorized code, you need to tell it that it is targeting
the KNL. Clang can add the necessary function attributes to do this.
You'll also want to run clang with optimizations enabled so that it will
generate IR that is intended to be optimized, even if you then disable
the actual optimizaitons to get the pre-opt IR.
clang -S -emit-llvm test.c -march=knl -O3 -mllvm -disable-llvm-optzns
then running opt as you have it below should produce the desired result.
Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better
AVX-512 code than 3.9 did.
-Hal
>
> then i optimized it;
>
> opt -S -O3 test.ll -o test_o3.ll
>
> then i used llc for code generation
>
> llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s
>
> llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s
>
>
> here is my generated code;
>
>
>
> .text
> .file"filer_o3.ll"
> .globlfoo
> .p2align4, 0x90
> .typefoo, at function
> foo: # @foo
> .cfi_startproc
> # BB#0: # %min.iters.checked
> pushq%rbp
> .Ltmp0:
> .cfi_def_cfa_offset 16
> .Ltmp1:
> .cfi_offset %rbp, -16
> movq%rsp, %rbp
> .Ltmp2:
> .cfi_def_cfa_register %rbp
> movq$-1024, %rax # imm = 0xFC00
> .p2align4, 0x90
> .*LBB0_1: # %vector.body*
> * # =>This Inner Loop Header: Depth=1*
> *vmovdqa32c+1024(%rax), %xmm0*
> *vmovdqa32c+1040(%rax), %xmm1*
> *vpadddb+1024(%rax), %xmm0, %xmm0*
> *vpadddb+1040(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1024(%rax)*
> *vmovdqa32%xmm1, a+1040(%rax)*
> *vmovdqa32c+1056(%rax), %xmm0*
> *vmovdqa32c+1072(%rax), %xmm1*
> *vpadddb+1056(%rax), %xmm0, %xmm0*
> *vpadddb+1072(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1056(%rax)*
> *vmovdqa32%xmm1, a+1072(%rax)*
> *addq$64, %rax*
> *jne.LBB0_1*
> # BB#2: # %middle.block
> popq%rbp
> retq
> .Lfunc_end0:
> .sizefoo, .Lfunc_end0-foo
> .cfi_endproc
>
> .typeb, at object # @b
> .commb,1024,16
> .typec, at object # @c
> .commc,1024,16
> .typea, at object # @a
> .comma,1024,16
>
> .ident"clang version 3.9.0 (tags/RELEASE_390/final)"
> .section".note.GNU-stack","", at progbits
>
> in the generated code although there is use of vmov... instructions
> but no zmm register? only xmm registers.
>
>
> Can you please specify where i am wrong. i have tried it several times
> by different parameters but always get xmm registers.
>
>
> Thank You
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170620/d312ed7e/attachment.html>
More information about the llvm-dev
mailing list