[LLVMdev] AVX code gen
Arnold Schwaighofer
aschwaighofer at apple.com
Thu Dec 12 07:57:10 PST 2013
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
.cfi_startproc
## BB#0: ## %entry
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
xorl %eax, %eax
.align 4, 0x90
LBB0_1: ## %vector.body
## =>This Inner Loop Header: Depth=1
vmovups (%rdx,%rax,4), %ymm0
vmulps (%rsi,%rax,4), %ymm0, %ymm0
vaddps (%rdi,%rax,4), %ymm0, %ymm0
vmovups %ymm0, (%rdi,%rax,4)
addq $8, %rax
cmpq $256, %rax ## imm = 0x100
jne LBB0_1
## BB#2: ## %for.end
popq %rbp
vzeroupper
ret
.cfi_endproc
$ cat test.c
void f(float * restrict A, float * restrict B, float * restrict C) {
for (int i = 0; i < 256; ++i)
A[i] += C[i] *B[i];
}
$ clang -v
clang version 3.5 (trunk 195376) (llvm/trunk 195372)
Best,
Arnold
On Dec 11, 2013, at 2:59 PM, Ken Gahagan <ken.gahagan at gmail.com> wrote:
> Hello -
>
> I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought that if the code is AVX-able by icc that clang / llvm would be able to do the same… The code is basic matrix multiplication written a number of ways (with and without transposition and such) as a performance measurement exercise.
>
> The environments I’ve tried are:
> Intel Ivy Bridge-EX (pre-release hardware) running Red Hat Linux 6.5
> Generic desktop with Haswell processor running Fedora 18
>
> If you have a moment to point me to the appropriate docs I’m happy to go learn on my own – but I’ve now googled for the better part of 3 days trying to find what invocation parameters I should use to get the desired use of packed AVX instructions and the YMM registers and I just can’t seem to get it right. I’m also grateful if you just send the correct invocation.
>
> I’ve actually started digging through the code as well - but since I am starting from zero it could take me a while to find an answer this way - just didn’t want you to think I’m not willing to try to find the answer on my own :-)
>
> Thank you,
> Ken
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list